Antecedent selection techniques for high-recall coreference resolution
Antecedent Selection Techniques for High-Recall Coreference Resolution Yannick Versley
SFB 441 / Seminar f¨ur Sprachwissenschaft
Abstract
larger when it is no longer possible to rely on surfacesimilarity.
We investigate methods to improve the re-
To overcome the limit of recall that is encoun-
call in coreference resolution by also trying
tered when only relying on surface features, newer
to resolve those definite descriptions where
systems for coreference resolutions (Daum´e III and
no earlier mention of the referent shares the
Marcu, 2005; Ponzetto and Strube, 2006; Versley,
same lexical head (coreferent bridging). The
2006; Ng, 2007, inter alia) use lexical semantic in-
problem, which is notably harder than iden-
formation as an indication for semantic compati-
bility in the absence of head equality. Most cur-
tions which have the same lexical head, has
rent systems integrate the identification of discourse-
been tackled with several rather different ap-
new definites (i.e., cases like “the sun” or “the man
proaches, and we attempt to provide a mean-
that Ben met yesterday”, which are definite, but
ingful classification along with a quantita-
not anaphoric) with the antecedent selection proper,
tive comparison. Based on the different mer-
which implies that the gain obtained for new features
its of the methods, we discuss possibilities to
is dependent on the feature’s usefulness both in find-
improve them and show how they can be ef-
ing semantically related mentions and for the use in
Introduction
One goal of this paper is to provide a better under-
standing of these information sources by comparing
Coreference resolution, the task of grouping men-
proposed (and partly new) approaches for resolv-
tions in a text that refer to the same referent in the
ing coreferent bridging by separately considering
real world, has been shown to be beneficial for a
the task of antecedent selection (i.e., presupposing
number of higher-level tasks such as information ex-
that discourse-new markables have been identified
traction (McCarthy and Lehnert, 1995), question an-
beforehand). Although state of the art methods for
swering (Morton, 2000) and summarisation (Stein-
modular discourse-new detection (Uryupina, 2003;
Poesio et al., 2005) do not achieve near-perfect accu-
While the resolution of pronominal anaphora and
racy for discourse-new detection, the results we give
tracking of named entities is possible with good
for antecedent selection represent an upper bound
accuracy, the resolution of definite NPs (having a
on recall and precision for the full coreference task,
common noun as their head) is usually limited to
and we think that this upper bound will be useful for
the cases that Vieira and Poesio (2000) call directcoreference, where both coreferent mentions have
Lascarides, 1998) is a much broader concept, the term ‘corefer-
the same head. The other cases, called coreferent
ent bridging’ is potentially confusing, as many cases are exam-ples of perfectly well-behaved anaphoric definite noun phrases.
bridging by Vieira and Poesio1, are notably harder
Because we want to emphasise the important difference to the
because the number of potential candidates is much
more easily resolved cases of same-head coreference, we willstick with ‘coreferent bridging’ as the only term that has been
1Because bridging (in the sense of Clark, 1975, or Asher and
established for this in the literature.
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational
Natural Language Learning, pp. 496–505, Prague, June 2007. c 2007 Association for Computational Linguistics
the design of features in both systems using a mod-
• synonymy: The antecedent and the anaphor are
ular approach, such as (Poesio et al., 2005), where
the decision on discourse-newness is taken before-
hand, and those that integrate discourse-new classifi-cation with the actual resolution of coreferent bridg-
• hyperonymy: The anaphor is a strict generali-
ing cases. In contrast to earlier investigations (Mark-
ert and Nissim, 2005; Garera and Yarowsky, 2006),
we provide a more extensive overview on features
• near-synonymy: The anaphor and antecedent
and also discuss properties that influence their com-
are semantically related but not synonyms in
Several approaches have been proposed for the
treatment of coreferent bridging. Poesio et al. (1997)use WordNet, looking for a synonymy or hypernymy
Of course, not all cases of coreferent bridging realise
relation (additionally, for coordinate sisters in Word-
such a lexical relation, as sometimes the anaphor
Net). The system of Cardie and Wagstaff (1999)
takes up information introduced elsewhere than in
uses the node distance in WordNet (with an upper
the lexical noun phrase head (Peter was found dead
limit of 4) as one component in the distance measure
in his flat . . . the deceased), or the coreference rela-
that guides their clustering algorithm. Harabagiu
tion is forced by the discourse structure, without the
et al. (2001) use paths through Wordnet, using not
only synonym and is-a relations, but also parts, mor-
phological derivations, gloss texts and polysemy,which are weighted with a measure based on the re-
lation types and number of path elements. Other ap-
proaches use large corpora to get an indication for
bridging relations: Poesio et al. (1998) use a general
word association metric based on common terms oc-curing in a fixed-width window, Gasperin and Vieira
Typical cases of coreference include cases like
(2004) use syntactic contexts of words in a large cor-
1,2a (hypernym) or 1,2b (compatible but non-
pus to induce a semantic similarity measure (similar
to the one introduced by Lin, 1998), and then use
example of associative bridging between the NP
lists of the n nouns that are (globally) most sim-
“the door” and its antecedent to “the house”; it
ilar to a given noun. Markert and Nissim (2005)
is inferred that the door must be part of the house
mine the World Wide Web for shallow patterns like
mentioned earlier (since doors are typically part of
“China and other countries”, indicating an is-a rela-
a house), which is not compatible with coreferent
tionship. Finally, Garera and Yarowsky (2006) pro-
bridging, but is also ranked highly by association
pose an association-based approach using nouns that
occur in a 2-sentence window before a definite de-
While hypernym relations (as found by hypernym
scription that has no same-head antecedent.
lookup in WordNet, or patterns indicating such rela-tions in unannotated texts) are usually a strong in-
Lexical vs. Referential Relations
dicator of coreference, they can only cover some
One important property of these information sources
of the cases, while the near-synonymous cases are
is the kind of lexical relations that they detect. The
left undiscovered. Similarity and association mea-
lexical relations that we expect in coreferent bridg-
sures can help for the cases of near-synonymy. How-
ever, while similarity measures (such as WordNetdistance or Lin’s similarity metric) only detect cases
• instance: The antecedent is an instance of the
of semantic similarity, association measures (such
as the ones used by Poesio et al., or by Garera
and Yarowsky) also find cases of associative bridg-
Land (country/state/land) Medikament (medical drug) highest ranked words, with very rare words removed
∗: RU 486, an abortifacient drugLin98: Lin’s distributional similarity measure (Lin, 1998)RFF: Geffet and Dagan’s Relative Feature Focus measure (Geffet and Dagan, 2004)TheY: association measure introduced by Garera and Yarowsky (2006)TheY:G2: similar method using a log-likelihood-based statistic (see Dunning 1993)
this statistic has a preference for higher-frequency terms
PL03: semantic space association measure proposed by Pad´o and Lapata (2003)
Table 1: Similarity and association measures: most similar items
ing like 1a,b; the result of this can be seen in ta-
hand, wordnets usually have limited coverage both
ble (2): while the similarity measures (Lin98, RFF)
in terms of lexical items and in terms of relations
list substitutable terms (which behave like synonyms
encoded (as their construction is necessarily labor-
in many contexts), the association measures (Garera
intensive), and – as Markert and Nissim remark
and Yarowsky’s TheY measure, Pad´o and Lapata’s
– they do not (and arguably should not) contain
association measure) also find non-compatible asso-
context-dependent relations that do not hold gener-
ciations such as country–capital or drug–treatment,
ally but only in some rather specific context, for ex-
which is why they are commonly called relation-
ample steel being anaphorically described as a com-free. For the purpose of coreference resolution, how-
modity in a financial text. Context-dependent rela-
ever we do not want to resolve “the door” to the an-
tions, Markert and Nissim argue, can be found using
tecedent “the house” as the two descriptions do not
shallow patterns (for example, steel and other com-
corefer, and it may be useful to filter out non-similar
modities), since a use in such a context would mean
that the idiosyncratic conceptual relation holds inthat context. Wordnets also have usually have poor
Information Sources
(or non-existant) coverage of named entities, whichare especially relevant for instance relations; this
Different resources may be differently suited for
kind of instance relations can often be found in large
the recognition of the various relations.
text corpora. The high-precision patterns that Mark-
ally, it would be expected that using a wordnet
ert and Nissim use only occur infrequently, but the
is the best solution if we are interested in an isa-
approach using shallow patterns allows to perform
the search of the World Wide Web, which somewhat
we would extract the pairs boy, (person) and
boy, ice-cream , in the hope that the former
While some near-synonyms can be found by look-
pair occurs comparatively more often and gets a
ing at the distance in a wordnet, they may be far
apart from each other because of ontological mod-eling decisions, or lexical items not covered by the
Experiments on Antecedent Selection
wordnet. Similarity and association measures canprovide greater coverage for these near-synonym re-
In a setting similar to Markert and Nissim (2005),
we evaluate the precision (proportion of correctcases in the resolved cases) and recall (correct cases
The measures both of Lin (1998) and of Pad´o and
to all cases) for the resolution of discourse-old def-
Lapata (2003, 2007) are distributional methods; for
inite noun phrases. Before trying to resolve coref-
each word, they create a distribution of the contexts
erent bridging cases, we look for compatible an-
they occur in, and similarity between two words is
tecedent candidates with the same lexical head and
calculated as the similarity of these distributions.2
resolve to the nearest such candidate if there is one.
The difference in these two methods is the repre-sentation of the contexts. While Lin uses contexts
For our experiments, we used the first 125 articles
that are expected to determine semantic preferences
of the coreferentially annotated T¨uBa-D/Z corpus of
(like being in the direct object position of one verb),
written newspaper text (Hinrichs et al., 2005), to-
Pad´o and Lapata only use the co-occuring words,
talling 2239 sentences with 633 discourse-old defi-
weighted by syntax-based distance. For example, in
nite descriptions, and the latest release of GermaNet(Kunze and Lemnitzer, 2002), which is the German-
Unlike Markert and Nissim, we did not limit the
Lin’s approach would yield ↑subj :like for Peter
evaluation to discourse-old noun phrases where an
and ↑dobj :like for ice-cream, while Pad´o and
antecedent is in the 4 preceding sentences, but also
Lapata’s approach would yield the contexts like
included cases where the antecedent is further away.
(with a weight of 1.0) and ice-cream (with a
As a real coreference resolution system would have
weight of 0.5) for Peter. As a consequence, Pad´o
to either resolve them correctly or leave them unre-
and Lapata’s measure is more robust against data
solved, we feel that this is less unrealistic and thus
sparseness but also finds related non-similar terms
preferable even when it gives less optimistic evalu-
(which are ultimately unwanted for coreference res-
ation results. Because overall precision is a mixture
olution). Pad´o and Lapata show their dependency-
of the precision of the same-head resolver and the
based measure to perform better in a word sense
precision of the resolution for coreferent bridging,
disambiguation task than the measure of Lund et al.
which is lower than that for same-head cases, we
(1995), on which Poesio et al. (1998) based their ex-
forcibly get less precision if we resolve more coref-
periments and which is based on the surface distance
erent bridging cases. As it is always possible to im-
prove overall precision by resolving fewer cases of
We also reimplemented the approach of Gar-
coreferent bridging, we separately mention the pre-
era and Yarowsky (2006), who extract potential
cision for coreferent bridging cases alone (i.e., num-
anaphor-antecedent pairs from unlabeled texts and
ber of correct coreferent bridging cases by all re-
rank these potentially related pairs by the mutual in-
solved coreferent bridging cases), which we deem
formation statistic. As an example, in a text like
In our evaluation, we included hypernymy search
and a simple edge-based distance based on Ger-
maNet, as well as a baseline using semantic classes(automatically determined by a combination of sim-
2Both measures use a weighted Jaccard metric on mutual
ple named entity classification and GermaNet sub-
information vectors to calculate the similarity. See Weeds andWeir (2005) for an overview of other measures.
sumption), as well as an evolved version of Markert
grammatical relations, was carried out on a subset
of all sentences (those with length ≤ 30), with an
unlexicalised PCFG parser and subsequent extrac-
tion of dependency relations (Versley, 2005). For
0.58 0.68
the last approach, where dependency relations were
needed but labeling accuracy was not as important,
we used a deterministic shift-reduce parser that Foth
and Menzel (2006) used as input source in hybrid
For all three approaches, we lemmatised the
words by using a combination of SMOR (Schmid
0.64 0.65
et al., 2004), a derivational finite-state morphology
for German, and lexical information derived from
Prec.NSH: precision for coreferent bridging cases
the lexicon of a German dependency parser (Foth
(1): consider candidates in the 4 preceding sentences
and Menzel, 2006). We mitigated the problem of vo-
(2): consider candidates in the 16 preceding sentences
cabulary growth in the lexicon, due to German syn-
(3): also try candidates such that the anaphor is
thetic compounds, by using a frequency-sensitive
unsupervised compound splitting technique, and(for semantic similarity) normalised common person
and location names to ‘(person)’ and ‘(location)’, re-spectively.
and Nissim’s approach, which is presented in (Ver-
Same-head resolution (including a check for
sley, 2007). For the methods based on similarity
modifier compatibility) allows to correctly resolve
and association measures, we implemented a simple
49.8% of all cases, with a precision of 86.5%.
ranking by the respective similarity or relatedness
The most simple approach for coreferent bridging,
value. Additionally, we included an approach due to
just resolving coreferent bridging cases to the near-
Gasperin and Vieira (2004), who tackle the problem
est possible antecedent (only checking for number
of similarity by using lists of most similar words to a
agreement), yields very poor precision (12% for the
certain word, based on a similarity measure closely
coreferent bridging cases), and as a result, the re-
related to Lin’s. They allow resolution if either (i)
call gain is very limited. If we use semantic classes
the candidate is among the words most similar to the
(based on both GermaNet and a simple classification
anaphor, (ii) the anaphor is among the words most
for named entities) to constrain the candidates and
similar to the candidate, (iii) the similarity lists of
then use the nearest number- and gender-compatible
anaphor and candidate share a common item. We
antecedent4, we get a much better precision (35%
tried out several variations in the length of the simi-
for coreferent bridging cases), and a much better
lar words list (Gasperin and Vieira used 15, we also
recall of 61.1%. Hyponymy lookup in GermaNet,
tried lists with 25, 50 and 100 items). The third pos-
without a limit on sentence distance, achieves a re-
sibility that Gasperin and Vieira mention (a common
call of 57.5% (with a precision of 67% for the re-
item in the similarity lists of both anaphor and an-
solved coreferent bridging cases), whereas using the
tecedent) resolves some correct cases, but leads to a
best single pattern (Y wie X, which corresponds to
much larger number of false positives, which is whywe did not include it in our evaluation.
3Arguably, it would have been more convenient to use a sin-
To induce the similarity and association measures
gle parser for all three approaches, but differing tradeoffs be-tween speed on one hand and accuracy for relevant information
presented earlier, we used texts from the German
and/or fitness of representation on the other hand made the re-
newspaper die tageszeitung, comprising about 11M
spective parser or chunker a compelling choice.
sentences. For the extraction of anaphor-antecedent
In German, grammatical gender is not as predictive as in
English as it does not reproduce ontological distinctions. For
candidates, we used a chunked version of the cor-
persons, grammatical and natural gender almost always coin-
pus (M¨uller and Ule, 2002). The identification of
cide, and we check gender equality iff the anaphor is a person.
the English Y s such as X), with a distance limit of
similar words that share this feature.
4 sentences5, on the Web only improves the recall
By replacing mutual information values with RFF
to 54.3% (with a lower precision of 55% for coref-
values in Lin’s association measure, Geffet and Da-
erent bridging cases). This is in contrast to the re-
gan were able to significantly improve the propor-
sults of Markert and Nissim, who found that Web
tion of substitutable words in the list of the most sim-
pattern search performs better than wordnet lookup;
ilar words. In our experiments, however, using the
see (Versley, 2007) for a discussion. Ranking all
RFF-based similarity measure did not improve the
candidates that are within a distance of 4 hyper-
similarity-list-based resolution or the simple rank-
/hyponymy edges in GermaNet by their edge dis-
ing, to the contrary, both recall and precision are less
tance, we get a relatively good recall of 60.5%, but
than for the Weighted Jaccard measure that we used
the precision (for the coreferent bridging cases) is
only at 39%, which is quite poor in comparison.
We attribute this to two factors: Firstly, Geffet
The results for Garera and Yarowsky’s TheY al-
and Dagan’s evaluation emphasises the precision in
gorithm are quite disconcerting – recall and the pre-
terms of types, whereas the use in resolving coref-
cision on coreferent bridging cases are lower than
erent bridging does not punish unrelated rare words
the respective baseline using (wordnet-based) se-
being ranked high – since these are rare, the like-
mantic class information or Pad´o and Lapata’s asso-
lihood that they occur together, changing a reso-
ciation measure. The technique based on Lin’s simi-
lution decision, is quite low, whereas rare related
larity measure does outperform the baseline, but still
words that are ranked high can allow a correct res-
suffers from bad precision, along with Pad´o and La-
olution. Secondly, Geffet and Dagan focus on high-
pata’s association measure. In other words, the simi-
frequency words, which makes sense in the context
larity and association measures seem to be too noisy
of ontology learning, but the applicability for tasks
to be used directly for ranking antecedents. The ap-
like coreference resolution (directly or in the ap-
proach of Gasperin and Vieira performs compara-
proach of Gasperin and Vieira) also depends on a
bly to the approach using Web-based pattern search
sensible treatment of lower-frequency words.
(although the precision is poorer than for the best-
Using the framework of Weeds et al. (2004), we
performing pattern for German, “X wie Y ” – X
found that the bias of lower frequency words for
such as Y , it is comparable to that of other patterns).
preferring high-frequency neighbours was higher forRFF (0.58 against 0.35 for Lin’s measure). Weeds
Improving Distributional Similarity?
and Weir (2005) discuss the influence of bias to-
While it would be na¨ıve to think that the methods
wards high- or low-frequency items for different
purely based on statistical similarity measures could
tasks (correlation with WordNet-derived neighbour
reach the accuracy that can be achieved with a hand-
sets and pseudoword disambiguation), and it would
constructed lexicalised ontology, it would of course
not be surprising if the different high-frequency bias
be nice if we could improve the quality of the se-
mantic similarity measure used in ranking and themost-similar-word lists. Combining Information Sources
Geffet and Dagan (2004) propose an approach
The information sources that we presented earlier
to improve the quality of the feature vectors used
and the corpus-based methods based on similarity
or association measures draw from different kinds of
weighting features using the mutual information
evidence and thus should be rather complementary.
value between the word and the feature, they pro-
To put it another way, it should be possible to get
pose to use a measure they call Relative Feature Fo-
the best from all methods, achieving the recall of the
cus: the sum of the similarities to the (globally) most
high-recall methods (like using semantic class in-
5There is a degradation in precision for the pattern-based
6Simple ranking with RFF gives a precision of 33% for
approach, but not for the GermaNet-based approach, which is
coreferent bridging cases, against 39% for Lin’s original mea-
why we do not use a distance limit for the GermaNet-based ap-
sure; for an approach based on similarity lists, we get 39%
and everything else). Very surprisingly, Garera and
Yarowsky’s TheY approach, despite starting out at a
lower precision (31%, against 39% for Lin and 42%
for PL03), profits much more from the semantic fil-
ter and reaches the best precision (47%), whereas
Lin’s semantic similarity measure profits the least.
Since limiting the distance to the 4 previous sen-
tences had quite a devastating effect for the approach
based on Lin’s similarity measure (which achieves
39% precision when all the candidates are avail-
able and 30% precision if it choses the most se-mantically similar out of the candidates that are in
the last 4 sentences), we also wanted to try and ap-
ply the distance-based filtering after finding seman-
The approach we tried was as follows: we rank all
candidates using the similarity function, and keep
only the 3 top-rated candidates. From these 3 top-
rated candidates, we keep only those within the last
4 sentences. Without filtering by semantic class, this
improves the precision to 41% (from 30% for lim-
iting the distance beforehand, or 39% without lim-
0.68 0.73 0.70 0.72
iting the distance). Adding filtering based on se-mantic classes to this (only keeping those from the
(2): consider candidates in the 16 preceding sentences(3)
3 top-rated candidates which have a compatible se-
: also try candidates such that the anaphor is
mantic class and are within the last 4 sentences), weget a much better precision of 53%, with a recall
that can still be seen as good (57.8%). In compari-son with the similarity-list-based approach, we get amuch better precision than we would get for meth-
formation, or similarity and association measures),
ods with comparable recall (the version with the 100
with a precision closer to the most precise method
most similar items has 44% precision, the version
using GermaNet. In the case of web-based patterns,
with 50 most similar items and matching both ways
Versley (2007) combines several pattern searches on
the web and uses the combined positive and nega-
Applying this distance-bounding method to Gar-
tive evidence to compute a composite score – with a
era and Yarowsky’s association measure still leads
suitably chosen cutoff, it outperforms all single pat-
to an improvement over the case with only seman-
terns both in terms of precision and recall. First re-
tic and gender checking, but the improvement (from
solving via hyponymy in GermaNet and then using
47% to 50%) is not as large as with the semantic
the pattern-combination approach outperforms the
similarity measure or Pad´o and Lapata’s association
semantic class-based baseline in terms of recall and
is reasonably close to the GermaNet-based approach
For the final system, we back off from the most
in terms of precision (i.e., much better than the ap-
precise information sources to the less precise. Start-
proach based only on the semantic class).
ing with the combination of GermaNet and pattern-
As a first step to improve the precision of the
based search on the World Wide Web, we begin
corpus-based approaches, we added filtering based
by adding the distance-bounded semantic similarity-
on automatically assigned semantic classes (per-
based resolver (LinBnd) and resolution based on
sons, organisations, events, other countable objects,
the list of 25 most similar words (following the
approach of Gasperin and Vieira 2004). This re-
et al. (1995) with wordnet relations and pattern
sults in visibly improved recall (from 62% to 68%),
search on a fixed-size corpus.7 However, they eval-
while the precision for coreferent bridging cases
uate only on a small subset of discourse-old definite
does not suffer much. Adding resolution based on
descriptions (those where a wordnet-compatible se-
Lin’s semantic similarity measure and Garera and
mantic relation was identified and which were rea-
Yarowsky’s TheY value leads to a further improve-
sonably close to their antecedent), and they did not
ment in recall to 69.7%, but also leads to a larger
distinguish coreferent from associative bridging an-
tecedents. Although the different evaluation methoddisallows a meaningful comparison, we think that
Conclusion
the more evolved information sources we use (Pad´oand Lapata’s association measure instead of Lund
In this paper, we compared several approaches to re-
et al’s, combined pattern search on the World Wide
solve cases of coreferent bridging in open-domain
Web instead of search for patterns in a fixed-size
corpus), as well as the additional information based
sources can match the precision of the hypernymy
on semantic similarity, lead to superior results when
information encoded in GermaNet, or that of using
a combination of high-precision patterns with theWorld Wide Web as a very large corpus, it is possi-
Ongoing and Future Work
ble to achieve a considerable improvement in terms
Both the distributional similarity statistics and the
of recall without sacrificing too much precision by
association measure can profit from more training
data, something which is bound by availability of
Very interestingly, the distributional methods
similar text (Gasperin et al., 2004 point out that us-
based on intra-sentence relations (Lin, 1998;
ing texts from a different genre strongly limits the
Pad´o and Lapata, 2003) outperformed Garera and
usefulness of the learned semantic similarity mea-
Yarowsky’s (2006) association measure when used
sure), and by processing costs (which are more se-
for ranking, which may due to sparse data problems
rious for distributional similarity measures than for
or simply too much noise for the latter. For the asso-
non-grammar-related association measures, as the
ciation measures, the fact that they are relation-free
also means that they can profit from added semantic
Based on existing results for named entity coref-
erence, a hypothetical coreference resolver combin-
The novel distance-bounded semantic similarity
ing our information sources with a perfect detec-
method (where we use the most similar words in the
tor for discourse-new mentions would be able to
previous discourse together with a semantic class-
achieve a precision of 88% and a recall of 83% con-
based filter and a distance limit) comes near the pre-
sidering all full noun phrases (i.e., including names,
cision of using surface patterns, and offers better ac-
but not pronouns). This is both much higher than
curacy than Gasperin and Vieira’s method of using
state-of-the art results for the same data set (Versley,
2006, gets 62% precision and 70% recall), but such
By combining existing higher-precision informa-
accuracy may be very difficult to achieve in prac-
tion sources such as hypernym search in GermaNet
tice, as perfect (or even near-perfect) discourse-new
and the Web-based approach presented in (Vers-
detection does not seem to achievable in the near fu-
ley, 2007) together with similarity- and association-
ture. Preliminary experiments show that the inte-
based resolution, it is possible to get a large im-
gration of pattern-based information leads to an in-
provement in recall even compared to the combined
crease in recall of 0.6% for the whole system (or
GermaNet+Web approach or an approach combin-
46% more coreferent bridging cases), but the inte-
ing GermaNet with a semantically filtered version
gration of distributional similarity (loosely based on
of Garera and Yarowsky’s TheY approach.
the approach by Gasperin and Vieira) does not lead
In independent research, Goecke et al. (2006)
7Thanks to Tonio Wandmacher for pointing this out to me at
combined the original LSA-based method of Lund
to a noticeable improvement over GermaNet alone;
a joint entity detection and tracking model. In
in isolation, the distributional similarity information
HLT/EMNLP’05, pages 97–104.
did improve the recall, albeit less than information
Dunning, T. (1993). Accurate methods for the statis-
tics of surprise and coincidence. Computational
The fact that only a small fraction of the achiev-
able recall gain is currently attained seems to sug-
gest that better identification of discourse-old men-
ing: Using probabilistic models as predictors for
tions could potentially lead to larger improvements.
a symbolic parser. In ACL 2006.
It also seems that firstly, it makes more sense to com-
Garera, N. and Yarowsky, D. (2006). Resolving and
bine information sources that cover different rela-
generating definite anaphora by modeling hyper-
tions (e.g. GermaNet for hypernymy and synonymy
nymy using unlabeled corpora. In CoNLL 2006.
and the pattern-based approach for instance rela-
Gasperin, C., Salmon-Alt, S., and Vieira, R. (2004).
tions) than those that yield independent evidence for
How useful are similarity word lists for indirect
the same relation(s), as GermaNet and the Gasperin
anaphora resolution? In Proc. DAARC 2004.
and Vieira approach do for (near-)synonymy; andsecondly, that good precision is especially important
in the context of integrating antecedent selection and
similarity lists for resolving indirect anaphora. In
discourse-new identification, which means that the
ACL’04 workshop on reference resolution and its
finer view that we get using antecedent selection ex-
periments (compared to direct use in a coreference
quality and distributional similarity. In CoLing2004. Acknowledgements
Goecke, D., St¨uhrenberg, M., and Wandmacher, T.
Schulte im Walde, Piklu Gupta and Sandra K¨ubler
(2006). Extraction and representation of seman-
for useful criticism of an earlier version, and to
tic relations for resolving definite descriptions. In
Simone Ponzetto and Michael Strube for feedback
Workshop on Ontologies in Text Technology (OTT
ported in this paper was supported by the Deutsche
Harabagiu, S., Bunescu, R., and Maiorano, S.
Forschungsgemeinschaft (DFG) as part of Collab-
(2001). Text and knowledge mining for corefer-
orative Research Centre (Sonderforschungsbereich)
ence resolution. In Proceedings of the 2nd Meet-
441 “Linguistic Data Structures”. ing of the North American Chapter of the Associa-tion of Computational Linguistics (NAACL-2001). References
Hinrichs, E., K¨ubler, S., and Naumann, K. (2005). A
Asher, N. and Lascarides, A. (1998). Bridging. Jour-
unified representation for morphological, syntac-
nal of Semantics, 15(1):83–113.
tic, semantic and referential annotations. In ACL
Cardie, C. and Wagstaff, K. (1999). Noun phrase
Workshop on Frontiers in Corpus Annotation II:
coreference as clustering. In Proceedings of theJoint Conference on Empirical Methods in Natu-
Kunze, C. and Lemnitzer, L. (2002). Germanet –
ral Language Processing and Very Large Corpora
representation, visualization, application. In Pro-(EMNLP/VLC 1999), pages 82–89.
Clark, H. H. (1975). Bridging. In Schank, R. C. and
Lin, D. (1998). Automatic retrieval and clustering
Nash-Webber, B. L., editors, Proceedings of the
of similar words. In Proc. CoLing/ACL 1998. 1975 workshop on Theoretical issues in natural
Lund, K., Atchley, R. A., and Burgess, C. language processing, pages 169–174, Cambridge,
(1995). Semantic and associative priming in high-
MA. Association for Computing Machinery.
dimensional semantic space. In Proc. of the 17th
Daum´e III, H. and Marcu, D. (2005). Annual Conference of the Cognitive Science Soci-
scale exploration of effective global features for
Conference and Conference on Empirical Meth-
knowledge sources for nominal anaphora resolu-
ods in Natural Language Processing, pages 1–8.
tion. Computational Linguistics, 31(3):367–402.
Uryupina, O. (2003). High-precision identification
McCarthy, J. F. and Lehnert, W. G. (1995). Using
of discourse new and unique noun phrases. In
decision trees for coreference resolution. In IJCAIProceedings of the ACL Student Workshop.
Morton, T. S. (2000). Coreference for NLP applica-
types. In Proceedings of the Fourth Workshop onTreebanks and Linguistic Theories (TLT 2005).
M¨uller, F. H. and Ule, T. (2002). Annotating topo-
logical fields and chunks – and revising POS tags
to noun phrase coreference resolution in German
at the same time. In Proceedings of the Nineteenth
newspaper text. In Konferenz zur VerarbeitungInternational Conference on Computational Lin-Nat¨urlicher Sprache (KONVENS 2006).
Ng, V. (2007). Shallow semantics for coreference
coreferent bridging in German newspaper text.
resolution. In IJCAI 2007, pages 1689–1694.
In Proceedings of GLDV-Fr¨uhjahrstagung 2007,
Pad´o, S. and Lapata, M. (2003). Constructing se-
mantic space models from parsed corpora. In Pro-
Vieira, R. and Poesio, M. (2000). An empirically
based system for processing definite descriptions.
Pad´o, S. and Lapata, M. (2007). Dependency-based
Computational Linguistics, 26(4):539–593.
construction of semantic space models. Compu-
Weeds, J. and Weir, D. (2005). Co-occurrence re-
tational Linguistics, to appear.
trieval: A flexible framework for lexical distri-
Poesio, M., Alexandrov-Kabadjov, M., Vieira, R.,
Goulart, R., and Uryupina, O. (2005).
discourse-new detection help definite description
Weeds, J., Weir, D., and McCarthy, D. (2004). Char-
acterizing measures of lexical distributional simi-
national Workshop on Computational Semantics
Poesio, M., Schulte im Walde, S., and Brew, C.
(1998). Lexical clustering and definite descrip-tion interpretation. In AAAI Spring Symposiumon Learning for Discourse.
Poesio, M., Vieira, R., and Teufel, S. (1997). Re-
solving bridging descriptions in unrestricted text. In ACL-97 Workshop on Operational Factors inPractical, Robust, Anaphora Resolution For Un-restricted Texts.
Ponzetto, S. P. and Strube, M. (2006). Exploiting
semantic role labeling, wordnet and wikipedia forcoreference resolution. In HLT-NAACL 2006.
Schmid, H., Fitschen, A., and Heid, U. (2004).
SMOR: A german computational morphologycovering derivation, composition and inflection. In Proceedings of LREC 2004.
Steinberger, J., Kabadjov, M., Poesio, M., and
based summarization with anaphora resolution. In Proceedings of Human Language Technology
Publié sur Rue89 (http://www.rue89.com) UIMM: la caisse noire remplie par des stagiaires fantômes Un témoin raconte comment les patrons des métallos ont détourné la moitié de l'argent destiné à la formation professionnelle… Dans l'océan de silence entretenu par les patrons autour de l'affaire de l'UIMM [2], Annick Le Page jette un gros rocher qui va
VIEUX-MONTRÉAL LUGAR A LA MEMORIA, MEMORIA DEL LUGAR Presentada por el Arquitecto Mario Brodeur, Consultor en patrimonio De los 25 espacios exteriores públicos catalogados en el Quartier Historique du Vieux- Montréal, algunos datan de mas de 300 años. Desde la declaración de “arrondissement historique” de ese barrio en 1964, todos esos espacios fueron objeto de importantes