A Conditional Random Field Approach to Classroom Discourse Analysis Using Multilevel Features

In this paper we introduce a taxonomy of classroom discourse with particular focus on mathematical problem-solving discourse. We first discuss the hierarchical nature of classroom discourse and describe how our taxonomy addresses this hierarchical structure. We then describe an approach to classroom discourse classification based on our proposed taxonomy using Conditional Random Fields with features originating from multiple linguistic levels. The multilevel features reduce the classification error rate by over 40% compared with a purely unigram lexical features baseline. The framework and approach proposed in this paper can be useful in future work in education research, as well as discourse analysis research and intelligent tutoring applications.

By: Juan M. Huerta

Published in: RC24870 in 2009


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .