Towards an Interoperability Standard for Text and Multi-Modal Analytics

Unstructured information may be thought of as the direct product of human communications. Examples include natural language documents, email, speech, images and video. It is information that was not encoded for machines to understand but rather authored for humans to understand. We say it is “unstructured” because it lacks explicit semantics (“structure”) required for computer programs to interpret the information as intended by the human author or required by the application. A growing number of applications see increasing value in exploiting unstructured information. This growth is largely driven by the wealth of unstructured information found on the external web, in corporate intranets, document repositories, call-centers, and in customer and employee business communications. For unstructured information to be processed by traditional applications, it must first be analyzed to assign application-specific semantics to the unstructured content. This analysis is performed by software components called text and multi-modal analytics. This report motivates and proposes elements of an architecture specification for creating, composing and facilitating the interoperability of text and multi-modal analytics based on the open-source UIMA project originated at IBM Research.

By: David Ferrucci; Adam Lally; Daniel Gruhl; Edward Epstein; Marshall Schor; J. William Murdock; Andy Frenkiel; Eric W. Brown; Thomas Hampp; Yurdaer Doganata; Christopher Welty; Lisa Amini; Galina Kofman; Lev Kozakov; Yosi Mass

Published in: RC24122 in 2006


