The CUNY-NLP Seminar presents: Consistency in information extraction

Speaker: David J McClosky (IBM Research)
Time: 2:15pm-3:30pm, April 26, Friday
Place: Room 6496, CUNY Graduate Center. 5th Ave & 34th St.

Information extraction, the task of finding entities, relations, and events from natural language, is an important building block in computational semantics. However, information extraction models of complex events with many subparts tend to model the parts independently. This sometimes leads them to predict inconsistent structures.

For example, an independent classifier for temporal relations could potentially predict that an individual died before they were born. While completely eliminating these assumptions is often difficult or intractable, this talk explores two mechanisms for ameliorating their effect.

The first technique casts a biomedical event extraction task as a dependency parsing problem. By converting event structures to a labeled dependency modification graph, off-the-shelf parsers can be used to perform event extraction with minimal modification. Other parsing strategies such as model combination and reranking can be employed for improved performance.

The second method shows how a global consistency layer can be added to a string of independent classifiers to extract the temporal spans of relations for named entities. The global consistency layer automatically learns probabilistic constraints from the data and can be easily incorporated. The combined model using independent classifiers with global consistency can use Gibbs sampling or random restart hill climbing for inference.

Bio:
David McClosky is a research scientist at IBM Research working on the Watson project. His primary research interests lie at the intersection of syntactic parsing and information extraction. He obtained his Ph.D. in Computer Science at Brown University in 2009 on semi-supervised methods and domain adaptation for parsing. More recently, he was a postdoctoral researcher in the Natural Language Processing Group in the Stanford Computer Science Department. His semi-supervised biomedical syntactic parser was used by all leading systems in the BioNLP 2011 shared task and his event extractor was a component in the best performing system.