|
|
|||||
|
Distance Relays |
|||||
|
"Human beings, who are almost unique in
having the ability to learn from the
experience of others, are also remarkable for their apparent
disinclination to do so." Douglas Adams, Last Chance to See
User-generated
content in the form of informal blogs and web forums, multimedia
presentations, and more structured content including customer reviews and
publicly accessible legal/regulatory filings all capture the experiences of
their authors. My research focuses on
learning and exploiting structure in unustructured and semistructured
user-generated content to enable both users and automated processes to "learn from the experience of others." My work seeks to organize and enable experiential knowledge through text
modeling and mining. This research is
exemplified in three on-going projects:
online customer reviews, SEC regulatory filings, and online health
forums. |
|||||
|
·
Learning relationships between needs and attributes: Online customer reviews present an
opportunity for learning and exploiting structure from unstructured user
comments. In reviews, users often
write both about why they purchased a product (their "needs") and
what they liked and disliked about particular product
"attributes." By relating
needs to attributes, we can recommend products to customers based upon
"needs" and explain those
recommendations in terms of product attributes and the underlying reviews
that support those relationships. In
this work, we learn the implicit structure of a text review in the form of
relationships between "needs" and "attributes." o Lee, T,
"Use-Centric Mining of Customer Reviews," Workshop on Information Technology and Systems, Dec 2004. Reviews are represented as Boolean vectors
of needs and attributes and we use association rule mining to learn
needs-attribute relationships. (pdf) o Lee, T, Li, S, and
Wei, R, "Learning Recommendations From Reviews," Wharton OPIM Working Paper. Decision-tree classification is adapted to
learn both product attributes and attribute values associated with particular
user needs. (email me) ·
Learning needs and attributes: In related work, we seek to
automatically learn ontologies of "attributes" and
"needs" based upon the voice-of-the-consumer. We exploit this experiential knowledge in
at least two ways. First, it supports
earlier work on learning relationships between needs and attributes. Second, it supports research in new product
development and marketing. Techniques
such as conjoint analysis and QFD rely upon first identifying a critical set
of attribute and/or user needs for which consumer partworths are elicited. o Lee, T, "Ontology
Induction from Online Customer Reviews" Group Decision and Negotiation, 16(3) 2007. Focusing only on lists of Pros and Cons, we
model pro and con phrases in a graph and define a constrained logic
programming to search for maximal cliques, hierarchically clustering
attribute phrases. (pdf) o Lee, T,
"Needs-based Analysis of Online Customer Reviews," International Conference on Electronic
Commerce, August 2007. Using a
simple language model and part-of-speech analysis, we search for user "needs"
as a complementary set to "attribute" terms in online reviews. (pdf) o Lee, T, and Bradlow,
E, "Automatic Construction of Conjoint Attributes and Levels from Online
Customer Reviews," Wharton OPIM
Working Paper. Marketing and
product design applications need to learn both attributes and attribute
values (levels). The need to learn
levels differentiates this work from the computer science literature on
learning product attributes by analyzing customer reviews. (email me) ·
Needs-centric
searching of customer reviews: In practice, most
online retailers that provide product reviews support only product-centric
browsing. Users must first select a
specific product and may then read reviews for that product. However, recent marketing surveys indicate
that as many as 40% of online consumers need help narrowing their choice set
(Voight, AdWeek, 11/12/07). We can exploit the relationship between
needs and attributes to facilitate product ranking by searching online
reviews. o Lee, T, Li, S, and
Wei, R, "Needs-based Searching and Ranking Based on Customer
Reviews," Wharton OPIM Working
Paper (submitted to the IEEE
Conference on Electronic Commerce, 2008). (pdf) ·
User studies: We have run user studies to validate the
effectiveness of automated approaches for learning needs and attributes. We are designing user studies to assess the
utility of needs-based recommendations and review-based explanations for
recommendations and rankings. |
|||||
|
·
Learning semantic labels for narrative text elements: The SEC is moving to require XML-based
eXtensible Business Reporting Language (XBRL) labeling of financial
figures. However, the SEC also
requires accompanying narrative text elements that lack XBRL labels. In many cases, the SEC mandates specific
narrative summarizations of particular data tables. Because filings are submitted by
independent firms, there are numerous inconsistencies in the labeling and
ordering of these elements between firms and by the same firm over time. We aim to facilitate automated compliance
checking by learning labels for naming and extracting narrative text elements
as well as constraints for relating narrative elements to numerical facts and
figures. On-going work eliminates the
need for any supervision by using a Gibbs-sampler Markov Chain approach. o Lee, T, "Using
Regulatory Instructions for Information Extraction," AAAI Workshop on Information Integration
on the Web, July 2007. We use a
semi-supervised algorithm that learns from the regulatory instructions rather
than from a set of labeled training examples.
A greedy, hill-climbing algorithm searches for labels within the text
based upon specifications from the regulatory instructions. (pdf) o Carroll, J and Lee, T,
"A Genetic Algorithm for Segmentation and Information Retrieval of
Regulatory Filings," Wharton OPIM
Working Paper (submitted to the NSF
Digital Government Conference, 2008).
A genetic algorithm is used to learn in a semi-supervised manner from
regulatory instructions rather than from a hand-labeled training set. The genetic algorithm approach addresses
several limitations of prior work using a greedy-hill climbing approach. (pdf) ·
Learning voluntary disclosures: In addition to mandatory
disclosures about firm performance, the SEC specifies that firms discuss
events and circumstances that may have a material impact on performance in
the upcoming year. These
"voluntary disclosures" vary between industries and from firm to
firm. Prior research based upon manual
coding indicates that voluntary disclosures within narrative text such as
Management's Discussion and Analysis may have predictive value for investors
(Cole and Jones, Journal of Accounting
Literature, 2005). We seek to
learn ontologies of "voluntary disclosures" from textual narratives
for the purpose of investor decision-making.
(work with Scott Richardson,
Irem Tunay, and Amy (Zhao) Yu). |
|||||
|
Informed patient decision-making: Online health information is a third opportunity for learning
and exploiting content structure. In
the past month, we have begun a pilot project with the Foundation for
Informed Medical Decision Making (www.fimdm.org). Prior research has demonstrated the
efficacy of informed patient decision-making through increased patient
satisfaction and decreased costs.
Trends towards increased medical subspecialization and health savings
account-based reimbursements will only heighten the need for informed
patients. Currently, the Foundation
monitors the research literature and conducts focus groups with patients,
patient families, and caregivers to develop disease-specific decision aids.
Our goal is to complement the existing process for developing decision aids
by linking to and learning from the dialogue of current patients and families
contained in online patient forums and web portals. (work with Thunyarat
(Bam) Amornpetchkul). |
|||||
|
|
|||||