Microsoft word - kdd1-13.doc

DIRT – Discovery of Inference Rules from Text
ABSTRACT
inference rule because we also want to include relationships that
In this paper, we propose an unsupervised method for discovering are not exactly paraphrases, but are nonetheless related and are inference rules from text, such as “X is author of Y X wrote Y”, potentially useful to information retrieval systems. For example, “X solved Y X found a solution to Y”, and “X caused Y Y is X caused YY is blamed on X is an inference rule even though triggered by X”. Inference rules are extremely important in many the two phrases do not mean exactly the same thing. fields such as natural language processing, information retrieval, Traditionally, knowledge bases containing such inference rules and artificial intelligence in general. Our algorithm is based on an are created manually. This knowledge engineering task is extended version of Harris’ Distributional Hypothesis, which extremely laborious. More importantly, building such a states that words that occurred in the same contexts tend to be knowledge base is inherently difficult since humans are not good similar. Instead of using this hypothesis on words, we apply it to at generating a complete list of rules. For example, while it is paths in the dependency trees of a parsed corpus. quite trivial to come up with the rule “X wrote Y X is the author of Y”, it seems hard to dream up a rule like “X manufactures Y 1. INTRODUCTION
X’s Y factory”, which can be used to infer that “Chrétien visited Text is the most significant repository of human knowledge. Peugot’s newly renovated car factory in the afternoon” contains Many algorithms have been proposed to mine textual data. Most an answer to the query “What does Peugot manufacture?” of them focus on document clustering [13], identifying Most previous efforts on knowledge engineering have focused on prototypical documents [20], or finding term associations [14] and creating tools for helping knowledge engineers transfer their hyponym relationships [9]. We propose an unsupervised method knowledge to machines [6]. Our goal is to automatically discover for discovering inference rules, such as “X is author of Y X wrote Y”, “X solved Y X found a solution to Y”, and “X caused Y Y is triggered by X”. Inference rules are extremely important in In this paper, we present an unsupervised algorithm, DIRT, for
many fields such as natural language processing, information Discovery of Inference Rules from Text. Our algorithm is a
retrieval, and artificial intelligence in general. generalization of previous algorithms for finding similar words [10][15][19]. Algorithms for finding similar words assume the For example, consider the query to an information retrieval Distributional Hypothesis, which states that words that occurred system: “Who is the author of the 'Star Spangled Banner'?” in the same contexts tend to have similar meanings [7]. Instead of Unless the system recognizes the relationship between “X wrote applying the Distributional Hypothesis to words, we apply it to Y and “X is the author of Y”, it would not necessarily rank the paths in dependency trees. Essentially, if two paths tend to link the same sets of words, we hypothesize that their meanings are . Francis Scott Key wrote the “Star Spangled Banner” in 1814. similar. Since a path represents a binary relationship, we generate an inference rule for each pair of similar paths. The remainder of this paper is organized as follows. In the next …comedian-actress Roseanne Barr sang her famous shrieking section, we review previous work. In Section 3, we define paths rendition of the “Star Spangled Banner” before a San Diego in dependency trees and describe their extraction from a parsed corpus. Section 4 presents the DIRT system and a comparison of our system’s output with manually generated paraphrase We call “X wrote Y X is the author of Y” an inference rule. In expressions is shown in Section 5. Finally, we conclude with a previous work, such relationships have been referred to as paraphrases or variants [24]. In this paper, we use the term 2. Previous Work
Most previous work on variant recognition and paraphrase has
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are been done in the fields of natural language generation, text not made or distributed for profit or commercial advantage and that summarization, and information retrieval. copies bear this notice and the full citation on the first page. To copy The generation community has focused mainly on rule-based text otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. transformations in order to meet external constraints such as length and readability [11][18][22]. Dras [4] described syntactic Copyright ACM 2001 1-58113-391-x/01/08…$5.00 paraphrases using a meta-grammar with a synchronous Tree Table 1. A subset of dependency relations in Minipar outputs.
In multi-document summarization, paraphrasing is important to avoid redundant statements in a summary. Given a collection of the CEO, John
similar sentences (a theme) with different wordings, it is difficult to identify similar phrases that report the same fact. Barzilay et al. John’s dog
[3] analyzed 200 two-sentence themes from a corpus and tiny hole
extracted seven lexico-syntactic paraphrasing rules. These rules covered 82% of syntactic and lexical paraphrases, which cover station manager
70% of all variants. The rules are subsequently used to identify John loves Mary.
common statements in a theme by comparing the predicate- argument structure of the sentences within the theme. In information retrieval, it is common to identify phrasal terms from queries and generate their variants for query expansion. It has been shown that such query expansion does promote effective John found a solution to the problem.
retrieval [1][2]. Morphological variant query expansion was treated by Sparck Jones and Tait [24] and Jacquemin [12]. Figure 1. Example dependency tree.
In [21], Richardson extracted semantic relationships (e.g., hypernym, location, material and purpose) from dictionary Minipar parses newspaper text at about 500 words per second on definitions using a parser and constructed a semantic network. He a Pentium-III 700Mhz with 500MB memory. Evaluation with the then described an algorithm that uses paths in the semantic manually parsed SUSANNE corpus [23] shows that about 89% of network to compute the similarity between words. In a sense, our the dependency relationships in Minipar outputs are correct. algorithm is a dual of Richardson’s approach. While Richardson used paths as features to compute the similarity between words, 3.2 Paths in Dependency Trees
we use words as features to compute the similarity of paths. In the dependency trees generated by Minipar, each link between two words in a dependency tree represents a direct semantic Many text mining algorithms aim at finding association rules relationship. A path allows us to represent indirect semantic between terms [14]. In contrast, the output of our algorithm is a relationships between two content words. We name a path by set of associations between relations. Term associations usually concatenating dependency relationships and words along the path, require human interpretation. Some of them are considered to be excluding the words at the two ends. For the sentence in Figure 1, the path between John and problem is named: N:subj:V!find"V:obj:N"solution"N:to:N (meaning “X finds 3. Extraction of Paths
solution to Y”). The reverse path can be written as: The inference rules discovered by DIRT are between paths in N:to:N!solution!N:obj:V!find" V:subj:N. The root of both
dependency trees. In this section, we introduce dependency trees paths is find. A path begins and ends with two dependency and define paths in trees. Finally, we describe an algorithm for relations. We call them the two slots of the path: SlotX on the left- hand side and SlotY on the right-hand side. The words connected by the path are the fillers of the slots. For example, John fills the 3.1 Dependency Trees
SlotX of N:subj:V!find"V:obj:N"solution"N:to:N and A dependency relationship [8] is an asymmetric binary problem fills the SlotY. The reverse is true for relationship between a word called head, and another word called
N:to:N!solution!N:obj:V!find"V:subj:N. In a path, modifier. The structure of a sentence can be represented by a set
dependency relations that are not slots are called internal
of dependency relationships that form a tree. A word in the relations. For example, find"V:obj:N"solution is an internal
sentence may have several modifiers, but each word may modify at most one word. The root of the dependency tree does not We impose a set of constraints on the paths to be extracted from modify any word. It is also called the head of the sentence. For example, Figure 1 shows the dependency tree for the sentence • most meaningful inference rules involve only paths that “John found a solution to the problem”, generated by a broad- coverage English parser called Minipar1 [16]. The links in the diagram represent dependency relationships. The direction of a • the constraints significantly reduce the number of distinct link is from the head to the modifier in the relationship. Labels paths and, consequently, the amount of computation required associated with the links represent types of dependency relations. for computing similar paths (described in Section 4.3); and Table 1 lists a subset of the dependency relations in Minipar the constraints alleviate the sparse data problem because long paths tend to have very few occurrences. 1Available at www.cs.ualberta.ca/~lindek/minipar.htm. • slot fillers must be nouns because slots correspond to Table 2. Sample slot fillers for two paths extracted from a
variables in inference rules and we expect the variables to be newspaper corpus.
X finds a solution to Y” “X solves Y” • any dependency relation that does not connect two content words (i.e. nouns, verbs, adjectives or adverbs) is excluded from a path. E.g. in Figure 1, the relation between a and • the frequency count of an internal relation must exceed a They had previously bought bighorn sheep from Comstock. The paths extracted from this sentence and their meanings are: Based on these common contexts, one can statistically determine that duty and responsibility have similar meanings. ≡ X buys something from Y In the algorithms for finding word similarity, dependency links are treated as contexts of words. In contrast, our algorithm for finding inference rules treats the words that fill the slots of a path as a context for the path. We make an assumption that this is an (c) N:subj:V!buy"V:obj:N"sheep"N:nn:N extension to the Distributional Hypothesis: (d) N:nn:N!sheep!N:obj:V!buy"V:from:N Extended Distributional Hypothesis:
X sheep is bought from Y If two paths tend to occur in similar contexts, the meanings of the paths tend to be similar. For example, Table 2 lists a set of example pairs of words An inverse path is also added for each one above. connected by the paths N:subj:V!find"V:obj:N"solution" N:to:N (“X finds a solution to Y) and N:subj:V!solve" 4. Discovering Inference Rules from Text
V:obj:N (“X solves Y”). As it can be seen from the table, there are A path is a binary relation between two entities. In this section, many overlaps between the corresponding slot fillers of the two we present an algorithm, called DIRT, to automatically discover paths. By the Extended Distributional Hypothesis, we can then the inference relations between such binary relations. claim that the two paths have similar meaning. 4.1 Underlying Assumption
4.2 Triples
Most algorithms for computing word similarity from text corpus To compute the path similarity using the Extended Distributional are based on a principle known as the Distributional Hypothesis Hypothesis, we need to collect the frequency counts of all paths in [7]. The idea is that words that tend to occur in the same contexts a corpus and the slot fillers for the paths. For each instance of a tend to have similar meanings. Previous efforts differ in their path p that connects two words w1 and w2, we increase the representation of the context and in their formula for computing frequency counts of the two triples (p, SlotX, w1) and (p, SlotY, the similarity between two sets of contexts. Some algorithms use w2). We call (SlotX, w1) and (SlotY, w2) features of path p. the words that occurred in a fixed window of a given word as its Intuitively, the more features two paths share, the more similar context while others use the dependency relationships of a given word as its context [15]. Consider the words duty and We use a triple database (a hash table) to accumulate the
responsibility. There are many contexts in which both of these frequency counts of all features of all paths extracted from a parsed corpus. An example entry in the triple database for the • duty can be modified by adjectives such as additional, administrative, assigned, assumed, collective, congressional, N:subj:V!pull"V:obj:N"body"N:from:N constitutional, , so can responsibility; • duty can be the object of verbs such as accept, articulate, is shown in Figure 2. The first column of numbers in Figure 2 assert, assign, assume, attend to, avoid, become, breach, …, represents the frequency counts of a word filling a slot of the path and the second column of numbers is the mutual information X pulls body from Y:
Table 3. The top-20 most similar paths to “X solves Y”.
where p1 and p2 are paths, s is a slot, T(pi, s) is the set of words that fill in the s slot of path pi. The similarity between a pair of paths p1 and p2 is defined as the geometric average of the similarities of their SlotX and SlotY Figure 2. An example entry in the triple database for the path
“X pulls body from Y”.
S( p , p = sim SlotX , SlotX × sim SlotY , SlotY between a slot and a slot filler. Mutual information measures the strength of the association between a slot and a filler. We explain i and SlotYi are path i’s SlotX and SlotY slots. mutual information in detail in Section 4.3. The triple database 4.4 Finding the Most Similar Paths
records the fillers of SlotX and SlotY separately. Looking at the The discovery of inference rules is made by finding the most database, one would be unable to tell which SlotX filler occurred similar paths of a given path. The challenge here is that there are a with which SlotY filler in the corpus. large number of paths in the triple database. The database used in 4.3 Similarity between Two Paths
our experiments contains over 200,000 distinct paths. Computing the similarity between every pair of paths is obviously Once the triple database is created, the similarity between two paths can be computed in the same way that the similarity between two words is computed in [15]. Essentially, two paths Given a path p, our algorithm for finding the most similar paths of have high similarity if there are a large number of common features. However, not every feature is equally important. For example, the word he is much more frequent than the word (a) Retrieve all the paths that share at least one feature with p sheriff. Two paths sharing the feature (SlotX, he) is less indicative and call them candidate paths. This can be done efficiently of their similarity than if they shared the feature (SlotX, sheriff). by storing for each word the set of slots it fills in. The similarity measure proposed in [15] takes this into account by (b) For each candidate path c, count the number of features computing the mutual information between a feature and a path. shared by c and p. Filter out c if the number of its common We use the notation |p, SlotX, w| to denote the frequency count of features with p is less than a fixed percent (we used 1%) of the total number of features for p and c. This step the triple (p, SlotX, w), |p, SlotX, *| to denote ∑ p, SlotX , w , effectively uses a simpler similarity formula to filter out some of the paths since computing mutual information is and |*, *, *| to denote ∑ p, s, w . more costly than counting the number of features. This idea has previously been used in Canopy [17]. Following [15], the mutual information between a path slot and its (c) Compute the similarity between p and the candidates that passed the filter using equation (2) and output the paths in descending order of their similarity to p. mi( p, Slot, w) Table 3 lists the Top-50 most similar paths to “X solves Y”  p, Slot,* × *,Slot, w  generated by DIRT. Most of the paths can be considered as paraphrases of the original expression. The similarity between a pair of slots: slot1 = (p1, s) and slot2 = (p2, s), is defined as: 5. Experimental Results
We performed an evaluation of our algorithm by comparing the w T ( p ,s )∩T ( p ,s ) inference rules it generates with a set of human-generated paraphrases of the first six questions in the TREC-8 Question- Table 4. First six questions from TREC-8.
Table 5. Evaluation of Top-40 most similar paths.
Who is the author of the book, “The Iron Lady: A Biography of What was the monetary value of the Nobel Peace Prize in 1989? What does the Peugeot company manufacture? How much did Mercury spend on advertising in 1993? What is the name of the managing director of Apricot Computer? Why did David Koresh ask the FBI for a word processor? Answering Track, listed in Table 4. TREC (Text REtrievial Conference) is a U.S. government sponsored competition on information retrieval held annually since 1992. In the Question-Answering Track, the task for participating systems is to find answers to natural-language questions like those in Table 4. 5.2 Observations
There is very little overlap between the automatically generated
5.1 Results
paths and the paraphrases, even though the percentage of correct We used Minipar to parse about 1GB of newspaper text (AP paths in DIRT outputs can be quite high. This suggests that Newswire, San Jose Mercury, and Wall Street Journal). Using the finding potentially useful inference rules is very difficult for methods discussed in Section 3, we extracted 7 million paths from humans as well as machines. Table 6 shows some of the correct the parse trees (231,000 unique) and stored them in a triple paths among the Top-40 extracted by our system for two of the TREC-8 questions. Many of the variations generated by DIRT that are correct paraphrases are missing from the manually The second column of Table 5 shows the paths that we identified generated variations. It is difficult for humans to recall a list of from the TREC-8 questions. For some questions, more than one paraphrases. However, given the output of our system, humans path was identified. For others, no path was found. We compare can easily identify the correct inference rules. Hence, at the least, the output of our algorithm with a set of manually generated our system would greatly ease the manual construction of paraphrases of the TREC-8 questions made available at ISI2. inference rules for an information retrieval system. We also extracted paths from the manually generated paraphrases. The performance of DIRT varies a great deal for different paths. For some paraphrases, an identical path is extracted. For example, Usually, the performance for paths with verb roots is much better “What things are manufactured by Peugeot?” and “What products than for paths with noun roots. A verb phrase typically has more are manufactured by Peugeot?” both map to the path “X is than one modifier, whereas nouns usually take a smaller number manufactured by Y”. The number of paths for the manually of modifiers. When a word takes less than two modifiers, it will generated paraphrases of TREC-8 questions is shown in the third not be the root of any path. As a result, paths with noun roots occur less often than paths with verb roots, which explains the For each of the paths p in the second column of Table 5, we ran lower performance with respect to paths with noun roots. the DIRT algorithm to compute its Top-40 most similar paths In Table 5, DIRT found no correct inference rules for Q2. This is using the triple database. We then manually inspected the outputs due to the fact that Q2 does not have any entries in the triple and classified each extracted path as correct or incorrect. A path p' is judged correct if a sentence containing p' might contain an answer to the question from which p was extracted. Consider question Q 6. Conclusion and Future Work
3 in Table 4 where we have p = “X manufactures Y” and we find p' = “X’s Y factory” as one of p’s Top-40 most similar Better tools are necessary to tap into the vast amount of textual paths. Since “Peugeot’s car factory” might be found in some data that is growing at an astronomical pace. Knowledge about corpus, p' is judged correct. Note that not all sentences containing inference relationships in natural language expressions would be p' necessarily contain an answer to Q extremely useful for such tools. To the best of our knowledge, this factory” gives the location of a Peugeot factory in France). The is the first attempt to discover such knowledge automatically from fourth column in Table 5 shows the number of Top-40 most a large corpus of text. We introduced the Extended Distributional similar paths classified as correct and the fifth column gives the Hypothesis, which states that paths in dependency trees have intersection between columns three and four. Finally, the last similar meanings if they tend to connect similar sets of words. column in Table 5 gives the percentage of correctly classified Treating paths as binary relations, our algorithm is able to generate inference rules by searching for similar paths. Our experimental results show that the Extended Distributional Hypothesis can indeed be used to discover very useful inference 2 Available at http://www.isi.edu/~gerber/Variations2.txt Table 6. Paths found for two of the six questions in TREC-8
8. REFERENCES
and the variations discovered manually and by DIRT.
[1] Anick, P.G. and Tipirneni, S. 1999. The Paraphrase Search Assistant: Terminological Feedback for Iterative Information Seeking. In Proceedings of SIGIR-99. pp. 153-159. Berkeley, CA. [2] Arampatzis, A. T., Tsoris, T., Koster, C. H. A., and van der Weide, T. P. 1998. Phrase-based infase-bas retrieval. Information Processing & Management, 34(6):693-707. [3] Barzilay, R., McKeown, K., and Elhadad, M. 1999. Information Fusion in the X makes Y; X produce Y; X is in Y Context of Multi-Document Summarization. In Proceedings of ACL-99. College Park, Maryland. business; Y is manufactured by X; Y is [4] Dras, M. 1999. A meta-level Grammar: Redefining Synchronous TAGs for provided by X; Y is X's product; Y is Translation and Paraphrase. In Proceedings of ACL-99. pp. 80-97. College Park, Maryland. product from X; Y is X product; Y is product made by X; Y is example of X [5] Feldman, R., Fresko, M., Kinar, Y., Lindell, Y., Liphstat, O., Rajman, M., Schler, Y., and Zamir, O. 1998. Text Mining at the Term Level. In product; X is manufacturer of Y; Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. pp. 65-73. Nantes, France. find Y in X's product line; find Y in X [6] Hahn, U. and Schnattinger, K. 1998. Towards Text Knowledge Engineering. In Proceedings of AAAI-98 / IAAI-98. Menlo Park, CA. pp. 524-531. [7] Harris, Z. 1985. Distributional Structure. In: Katz, J. J. (ed.) The Philosophy of Linguistics. New York: Oxford University Press. pp. 26-47. [8] Hays, D. 1964. Dependency Theory: A Formalism and Some Observations. [9] Hearst, M. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of ACL-92. Nantes, France. [10] Hindle, D. 1990. Noun Classification from Predicate-Argument Structures. In Proceedings of ACL-90. pp. 268-275. Pittsburgh. [11] Iordanskaja, L, Kittredge, R., and Polguere, A. 1991. Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer. Boston, MA. [12] Jacquemin, C., Klavans, J. L., and Tzoukermann, E. 1999. NLP for Term Variant Extraction: A Synergy of Morphology, Lexicon, and Syntax. Natural Language Information Retrieval, T. Strzalkowski, editor. pp. 25-74. Kluwer. Boston, MA. [13] Larsen, B. and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of KDD-99. pp. 16-22. San Diego, CA. [14] Lin, S. H., Shih, C. S., Chen, M. C., et al. 1998. Extracting Classification rules, many of which, though easily recognizable, are difficult for Knowledge of Internet Documents with Mining Term Associations: A Semantic Approach. In Proceedings of SIGIR-98. Melbourne, Australia. Many questions remain to be addressed. One is to recognize the Lin, D. 1998. Extracting Collocations from Text Corpora. Workshop on Computational Terminology. pp. 57-63. Montreal, Canada. polarity in inference relationships. High similarity values are often assigned to relations with opposite polarity. For example, “X [16] Lin, D. 1993. Principle-Based Parsing Without OverGeneration. In Proceedings of ACL-93. pp. 112-120. Columbus, OH. worsens Y” has one of the highest similarity to “X solves Y” according to equation (2). In some situations, this may be helpful [17] McCallum, A., Nigam, K., and Ungar, L. H. 2000. Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. In while for others it may cause confusion. Proceedings of KDD-2000. Boston, MA. Another is to extend paths with constraints on the inference rule’s [18] Meteer, M. M. and Shaked, V. 1988. Strategies for Effective Paraphrasing. In variables. For example, instead of generating a rule “X Proceedings of COLING-88. pp. 431-436 Budapest. manufactures Y X’s Y factory”, we may want to generate a rule [19] Pereira, F., Tishby, N., and Lee, L. 1993. Distributional Clustering of English Words. In Proceedings of ACL-93. pp. 183-190. Columbus, Ohio. with an additional clause: “X manufactures Y X’s Y factory, where Y is an artifact”. The “where” clause can be potentially [20] Rajman, M. and Besançon, R. 1997. Text Mining: Natural Language discovered by generalizing the intersection of the SlotY fillers of Techniques and Text Mining Applications. In Proceedings of the seventh IFIP 2.6 Working Conference on Database Semantics (DS-7). [21] Richardson, S. D. 1997. Determining Similarity and the Inferring Relations in a Lexical Knowledge-Base. Ph.D. Thesis. The City University of New York. 7. ACKNOWLEDGMENTS
[22] Robin, J. 1994. Revision-based Generation of Natural Language Summaries The authors wish to thank the reviewers for their helpful Providing Historical Background. Ph.D. Dissertation. Columbia University. comments. This research was partly supported by Natural [23] Sampson, G. 1995. English for the Computer - The SUSANNE Corpus and Sciences and Engineering Research Council of Canada grant Analytic Scheme. Clarendon Press. Oxford, England. [24] Sparck Jones, K. and Tait, J. I. 1984. Automatic Search Term Variant Generation. Journal of Documentation, 40(1):50-66.

Source: http://tanev.dir.bg/LinPantel2001.pdf

Microsoft word - sod treatments part ii

Sphincter of OddiDysfunction Treatments Part II: Medicinals and Supplements This Sphincter of Oddi Dysfunction (SOD) Treatments List is a culmination of patient experiences from Internet support groups and blogs; and information gathered from the professional medical and alternative health community. Although there is no known "permanent cure" for SOD, there are many different t

Pr_2-2008_soma

Pressemitteilung Nr. 2/2008 - Nürnberg, 07. April 2008 STUDIO GONG und SOMA 2 vermarkten gemeinsam Um das Onlineangebot der STUDIO GONG GmbH & Co. Studiobetriebs KG zu erweitern, arbeitet der führende Hörfunkvermarkter mit dem Onlinevermarkter SOMA 2 GmbH zusammen. Die Kooperation stärkt die Position der beiden Unternehmen im Bereich Radio und Regionalwerbung sowie der nationalen

Copyright © 2011-2018 Health Abstracts