Distance Relays



"Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so."

Douglas Adams, Last Chance to See


User-generated content in the form of informal blogs and web forums, multimedia presentations, and more structured content including customer reviews and publicly accessible legal/regulatory filings all capture the experiences of their authors.  My research focuses on learning and exploiting structure in unstructured and semistructured user-generated content to enable both users and automated processes to "learn from the experience of others."  My work seeks to organize and enable experiential knowledge through text modeling and mining.  This research is exemplified in three on-going projects:  online customer reviews, SEC regulatory filings, and online health forums.


Online customer reviews


         Learning relationships between needs and attributes:  Online customer reviews present an opportunity for learning and exploiting structure from unstructured user comments.  In reviews, users often write both about why they purchased a product (their "needs") and what they liked and disliked about particular product "attributes."  By relating needs to attributes, we can recommend products to customers based upon "needs" and explain those recommendations in terms of product attributes and the underlying reviews that support those relationships.  In this work, we learn the implicit structure of a text review in the form of relationships between "needs" and "attributes." 

o   Lee, T, "Use-Centric Mining of Customer Reviews," Workshop on Information Technology and Systems, Dec 2004.   Reviews are represented as Boolean vectors of needs and attributes and we use association rule mining to learn needs-attribute relationships. (pdf)

o   Lee, T, Li, S, and Wei, R, "Learning Recommendations From Reviews," Wharton OPIM Working Paper.  Decision-tree classification is adapted to learn both product attributes and attribute values associated with particular user needs. (email me)


           Learning needs and attributes:  In related work, we seek to automatically learn ontologies of "attributes" and "needs" based upon the voice-of-the-consumer.  We exploit this experiential knowledge in at least two ways.  First, it supports earlier work on learning relationships between needs and attributes.  Second, it supports research in new product development and marketing.  Techniques such as conjoint analysis and QFD rely upon first identifying a critical set of attribute and/or user needs for which consumer partworths are elicited.

o   Lee, T, "Ontology Induction from Online Customer Reviews" Group Decision and Negotiation, 16(3) 2007.  Focusing only on lists of Pros and Cons, we model pro and con phrases in a graph and define a constrained logic programming to search for maximal cliques, hierarchically clustering attribute phrases. (pdf)

o   Lee, T, "Needs-based Analysis of Online Customer Reviews," International Conference on Electronic Commerce, August 2007.  Using a simple language model and part-of-speech analysis, we search for user "needs" as a complementary set to "attribute" terms in online reviews. (pdf)

o   Lee, T, and Bradlow, E, "Automatic Construction of Conjoint Attributes and Levels from Online Customer Reviews," Wharton OPIM Working Paper.  Marketing and product design applications need to learn both attributes and attribute values (levels).  The need to learn levels differentiates this work from the computer science literature on learning product attributes by analyzing customer reviews. (email me)


           Needs-centric searching of customer reviews:   In practice, most online retailers that provide product reviews support only product-centric browsing.  Users must first select a specific product and may then read reviews for that product.  However, recent marketing surveys indicate that as many as 40% of online consumers need help narrowing their choice set (Voight, AdWeek, 11/12/07).  We can exploit the relationship between needs and attributes to facilitate product ranking by searching online reviews. 

o   Lee, T, Li, S, and Wei, R, "Needs-based Searching and Ranking Based on Customer Reviews," Wharton OPIM Working Paper (submitted to the IEEE Conference on Electronic Commerce, 2008). (pdf)


           User studies:  We have run user studies to validate the effectiveness of automated approaches for learning needs and attributes.  We are designing user studies to assess the utility of needs-based recommendations and review-based explanations for recommendations and rankings. 


SEC regulatory filings


           Learning semantic labels for narrative text elements:  The SEC is moving to require XML-based eXtensible Business Reporting Language (XBRL) labeling of financial figures.  However, the SEC also requires accompanying narrative text elements that lack XBRL labels.  In many cases, the SEC mandates specific narrative summarizations of particular data tables.  Because filings are submitted by independent firms, there are numerous inconsistencies in the labeling and ordering of these elements between firms and by the same firm over time.  We aim to facilitate automated compliance checking by learning labels for naming and extracting narrative text elements as well as constraints for relating narrative elements to numerical facts and figures.  On-going work eliminates the need for any supervision by using a Gibbs-sampler Markov Chain approach.

o   Lee, T, "Using Regulatory Instructions for Information Extraction," AAAI Workshop on Information Integration on the Web, July 2007.   We use a semi-supervised algorithm that learns from the regulatory instructions rather than from a set of labeled training examples.  A greedy, hill-climbing algorithm searches for labels within the text based upon specifications from the regulatory instructions. (pdf)

o   Carroll, J and Lee, T, "A Genetic Algorithm for Segmentation and Information Retrieval of Regulatory Filings," Wharton OPIM Working Paper (submitted to the NSF Digital Government Conference, 2008).  A genetic algorithm is used to learn in a semi-supervised manner from regulatory instructions rather than from a hand-labeled training set.  The genetic algorithm approach addresses several limitations of prior work using a greedy-hill climbing approach. (pdf)


           Learning voluntary disclosures:  In addition to mandatory disclosures about firm performance, the SEC specifies that firms discuss events and circumstances that may have a material impact on performance in the upcoming year.  These "voluntary disclosures" vary between industries and from firm to firm.  Prior research based upon manual coding indicates that voluntary disclosures within narrative text such as Management's Discussion and Analysis may have predictive value for investors (Cole and Jones, Journal of Accounting Literature, 2005).  We seek to learn ontologies of "voluntary disclosures" from textual narratives for the purpose of investor decision-making.  (work with Scott  Richardson, Irem Tunay, and Amy (Zhao) Yu).


Online health forums


Informed patient decision-making:  Online health information is a third opportunity for learning and exploiting content structure.  In the past month, we have begun a pilot project with the Foundation for Informed Medical Decision Making (www.fimdm.org).  Prior research has demonstrated the efficacy of informed patient decision-making through increased patient satisfaction and decreased costs.  Trends towards increased medical subspecialization and health savings account-based reimbursements will only heighten the need for informed patients.  Currently, the Foundation monitors the research literature and conducts focus groups with patients, patient families, and caregivers to develop disease-specific decision aids. Our goal is to complement the existing process for developing decision aids by linking to and learning from the dialogue of current patients and families contained in online patient forums and web portals. (work with Thunyarat (Bam) Amornpetchkul).