TEXT SIMPLIFICATION: A SURVEY

Thursday, March 27th, 2008
10:00AM - 12:00PM

Speaker: 

Lijun Feng

Location: 

Room 8106

Abstract: 

Long and complex sentences are not only difficult for many human readers to process, they prove to be a stumbling block for automatic systems which rely on natural language input. In order to ease the task of processing such sentences, it is desirable to simplify them grammatically and structurally into shorter and simpler sentences while preserving the meaning and information contained within them. Text Simplification is an NLP task that tries to simplify natural language texts in such a fashion. In the past decade, a number of user- and task-oriented text simplification systems have been developed worldwide, notably in United Kingdom, Japan, and United States. Broadly speaking, work in the field has progressed through three major stages: syntactic, lexical, and discourse simplification. Research on syntactic simplification focused on developing various chunking techniques to provide robust shallow syntactic analysis of the input, using PoS tags, punctuation marks, and pattern matching techniques to identify simplifiable complex constructs. Important advances made in this stage also included the design of rule-based syntactic transformation and automatic induction of simplification rules, both based on dependency trees. Lexical approaches have included the use of WordNet, dictionary, ontological, and other techniques for replacing complex words with simpler ones. The discourse-level work in the field has centered on preserving the text coherence and cohesion after syntactic simplification; therefore, research has included anaphora resolution and replacement, ordering of simplified sentences, choices of cue phrases, and ordering and expressing constituents governed by a particular discourse relation.

This literature survey presents the current state of the art and reports what major linguistic issues there are to be addressed in designing such a system, what has been achieved technically in the field, what technologies related to text simplification are made available today, and what still remains as open research topics for future work.

Committee: 

PROFESSOR MATT HUENERFAUTH, MENTOR, QUEENS COLLEGE
PROFESSOR VIRGINIA TELLER, HUNTER COLLEGE
PROFESSOR WILLIAM SAKAS, HUNTER COLLEGE