Link Grammar Parser
by Davy Temperley, John Lafferty and Daniel Sleator (this variant maintained by Dom Lachowicz - <domlachowicz@gmail.com> and Linas Vepstas - <linasvepstas@gmail.com> )News
February, 2011: link-grammar 4.7.4 released! See below for a description of recent changes.
What is the Link Grammar?
The Link Grammar Parser is a syntactic parser of English (and other languages as well), based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a "constituent" (Penn tree-bank style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.). The RelEx extension provides dependency-parse output.
Did the AbiWord team write Link Grammar?
In large part, no. The project is the brainchild of Davy Temperley, John Lafferty and Daniel Sleator, all university professors. It is the product of a decade of academic research into grammar, and is founded on a theory backed by numerous publications. Its canonical homepage is hosted by Carnegie Mellon University.
So, then what is it doing @ AbiSource.com?
The AbiWord team had a concrete need - to integrate a grammar checking feature into AbiWord. The best choice, they felt, was to build upon Temperley et. al.'s successful Link Grammar project.
However, in order for the link-grammar project to be useful to them and to the greater Free Software world, the AbiWord community felt that a variety of changes to the project would be necessary. While they did have success (a few years ago) convincing the authors to release Link Grammar under a GPL-compatible license, there was no practical way to continue project development and maintenance at the CMU website. So the AbiWord community took it under its wing and has nurtured the project since.
Ongoing development by OpenCog
Ongoing development of link-grammar is being primarily guided by the Open Cognition project, where the parser plays an important role in the OpenCog natural language processing subsystem. Research and implementation is ongoing; current work includes investigations into statistically guided parse ranking, grammatically induced word-sense disambiguation using statistical results from the Mihalcea all-words WSD algorithm, and work on automatically learning new parse rules based on corpus statistics.
A sibling project, RelEx, uses constraint-grammar-like techniques to extract dependency relations and assorted additional linguistic information, including FrameNet-style framing and reference (anaphora) resolution. The dependency output is similar to that of the Stanford parser. It's performance is comparable to the Stanford PCFG parsing model, and is more than three times faster than the Stanford "lexicalized" (factored) model.
The NLGen and NLGen2 projects provide natural language generation modules, based on, and compatible with link-grammar and RelEx. They implement the SegSim ideas for NL generation. See the following NLGen demos: Demo of Virtual Dog Learning to Play Fetch via Imitation and Reinforcement, AI Virtual Dog's Emotions Fluctuate Based on Its Experiences, Demo of Embodied Anaphora Resolution and AI Virtual Dog Answers Simple Questions about Itself and Its Environment.
Notable changes from the upstream Link Grammar package include:
- Actively maintained! New releases typically every few months.
- Numerous bug fixes and performance improvements; expanded dictionaries with thousands of new words; improved parse coverage for a wide variety of constructions.
- Merger of BioLG project changes, for improved parsing of biomedical text.
- New bindings, including Ruby, Python, perl, Java and Ocaml.
- Support for UTF8 Unicode; Arabic and Persian dictionaries; prototype German dictionary.
- Multi-threading support; a standard build system; pkg-config integration; dynamic/shared library support; fixes for non-Linux platforms: i.e. Windows, MacOSX, FreeBSD.
Downloading Link Grammar
The system can be downloaded either as a tarball, or via SVN. The current stable version is Link Grammar 4.7.4 (February, 2011). Older versions are available here.
Unstable, development versions are available through AbiWord's SVN repository. Anonymous read-only access is available by issuing the command:
svn co http://svn.abisource.com/link-grammar/trunk link-grammar
General instructions for AbiWord's anonymous SVN can be found here.
The Link Grammar source can be browsed online here.
Documentation
One of the best ways to obtain a solid, easy-to-understand overview of the parser is to review the original papers describing it, here, here, here and here. There is an extensive set of pages documenting the dictionary; specifically, the names of links and their meanings, as well as how to write new rules. There is also a short primer for creating dictionaries for new languages. The documentation for the programming API is here. Documentation for additions made in the 4.0 release is on the improvements page. A fairly comprehensive bibliography of papers written before 2004 is here.
Mailing Lists
The current list for Link Grammar discussion is at the link-grammar google group.
Subscribe to link-grammar:
Bug Tracker
Bug reports, patches, RFEs, etc. are gladly welcomed.
- Bug reports should be filed at the Google code bug tracker.
- General issue discussion, requests for enhancement, and related matters should be discussed on the Link Grammar mailing list
Disclaimer
Link grammar is a natural language parser, not an artificial intelligence. This means that there are many sentences that it cannot parse correctly, and many others for which it generates multiple parses. There are also entire classes of speech that it cannot parse, such as Valley-girl speak. Link grammar does best on "newspaper English": medium-length sentences written with good grammar, proper punctuation, and proper capitalization. It don't do 733t speek, etc. In particular, it has problems with the following "registers" and types of writing:
- Phrases (that are not a part of a complete sentence)
- Bullited lists, such as this.
- Quotations within sentences (and parenthetical remarks) These can be handled by an appropriate front-end, that separates out the quotations from the rest of the text.
- Slang speech, words, like 733t warez d00dz, although it can certainly guess from context if the slang is sufficiently grammatical.
- Long run-on sentences. These can generate thousands of alternative parses in a combinatorial explosion.
- Certain "registers", such as newspaper headlines; for example, "Thieves rob bank."
In addition, it has a variety of "bugs": it currently has trouble with "if...then..." constructs, compound queries ("who did it, and why?"), lists, "...not only...but also..." constructs, certain types of idiomatic phrases, certain types of "institutional utterances", and so on. The goal of the project is to eventually fix all of these cases; progress is ongoing.
Adjunct Projects
- RelEx Semantic Relation Extractor
- RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It will also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. RelEx includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection.
- AutoIt bindings
- AutoIt is a scripting language for Windows. This zipfile provides AutoIt bindings to Link Grammar, thanks to JRowe. Includes binary Windows DLL's for a recent link-grammar version, as well. (The source code for the AutoIt bindings are included, by default, in the base link-grammar distribution.)
- Delphi bindings (New!)
- Delphi (Pascal) is a popular development environment for Windows. The LaKraven Page provides the source for Delphi bindings, as well as pre-compiled DLL's for Windows.
- Java bindings
- Java bindings are included, by default, in the base link-grammar distribution.
- Objective Caml bindings
- OCaml interface to Link Grammar
- Perl bindings
- The perl bindings, created by Danny Brian, have been updated. See the Lingua-LinkParser page on CPAN. There is also a tutorial written against an older version of the bindings; some details may be different.
- Python bindings
- New python bindings are in development. Development snapshots are available on Launchpad. Install instructions here.
- Ruby bindings
- There are two different packages providing Ruby bindings: Ruby Link Grammar, which is up-to-date and currently maintained, and Link Grammar 4 Ruby, which is wildly out-of-date (its for version 4.2.2) and is unmaintained. You only need one!
- Persian dictionaries
- Persian dictionaries, by Jon Dehdari. These require the Persian stemming engine, as significant morphology analysis needs to be performed to parse Persian.
- Pre-parsed Wikipedia
- Parsed versions of various texts, including all articles from a May 2008 dump of Wikipedia, as well as a partial parse of an October 2010 dump, are available at http://gnucash.org/linas/nlp/data/
- Arabic dictionaries
- Arabic dictionaries, by Jon Dehdari. [download] These require the Aramorph stemming package, which is included.
- French dictionary, Luthor
- The Luthor project aims to develop a set of scripts to automatically construct Link Grammar linkage dictionaries by mining Wiktionary data. Current efforts are focusing on French.
- Russian parser
- Located at http://slashzone.ru/parser/. By Sergey Protasov. Includes link documentation and subscript (morphology) documentation. Russian morpheme dictionaries can be had at http://aot.ru.
- English dictionary extensions
- LinkGrammar-WN is a lexicon expansion for the English language Link Grammar Parser. This project adds 14K new words to the dictionaries. The extended lexicon is provided under the GPL license, and thus cannot be merged back into the current project.
- Medical Text Analysis
- The MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Clinical Decision Making Group has done work to extend the Link Grammar dictionaries by adding many new words. All but the six largest of these dictionaries have been merged into link-grammar, since version 4.3.1. The large dictionaries EXTRA.2, EXTRA.3, EXTRA.8, EXTRA.9, EXTRA.12, and EXTRA.17 have not been merged. These dictionaries contain 180K assorted medical, biological and biochemical terms and phrases.
- BioLG
- The BioLG project is a modification of the Link Grammar Parser adapted for the biomedical domain, as described in Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches (Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko; BMC Bioinformatics 2006). Almost all of the BioLG changes have been merged back into the main line, as of version 4.5.0 (April 2009), with scattered bug-fixes after that.
Of related interest
- Genia tagger
- The Genia tagger is useful for named entity extraction. BSD license source.
Recent Applications and Publications
Some recent uses and applications of the Link Grammar Parser are shown below. There is also an extensive bibliography on the CMU website (mirror) referencing several dozen older (pre-2004) papers pertaining to the Link Grammar Parser.
- Blake Lemoine, NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System (2009).
- Akshar Bharati, Dipti Misra Sharma, Sukhada, Adapting Link Grammar Parser (LGP) to Paninian Framework Mapping of Parser Relations for Indian Languages (2009) National Seminar on Computer Science and its Applications in Traditional Shastras (CSATS'09) Report No: IIIT/TR/2009/218
- Jorg Hakenberg, et al. Molecular event extraction from Link Grammar parse trees Proceedings of the Workshop on BioNLP: Shared Task, pages 86–94, Boulder, Colorado, June 2009
- Denis Bechet, k-Valued Link Grammars are Learnable from Strings. 2008 Proceedings of FGVienna: The 8th Conference on Formal Grammar.
- Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko, "Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches". BMC Bioinformatics 2006.
- Schneider, Gerold (1998). "A Linguistic Comparison Constituency, Dependency, and Link Grammar". Masters Thesis, University of Zurich.
- Özlem Istek, "A Link Grammar for Turkish", Thesis, 2006
- Shailly Goyal and Niladri Chatterjee, " Study of Hindi Noun Phrase Morphology for Developing a Link Grammar Parser", Language in India, Volume 5 : 8 August 2005
- Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum, "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents" (2006)
- P. Szolovits, "Adding a Medical Lexicon to an English Parser". Proc. AMIA 2003 Annual Symposium. Pages 639-643. 2003.
- Jing Ding, Daniel Berleant, Jun Xu, Andy W. Fulmer, "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser" Proceedings ICTAI 2003 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
- Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, and Yasser M. Kadah, "PIELG: A Protein Interaction Extraction System using a Link Grammar Parser from Biomedical Abstracts". International Journal of Biological, Biomedical and Medical Sciences 3;3 www.waset.org Summer 2008
- I. Marshall and E. Safar, "Extraction of semantic representations from syntactic CMU link grammar linkages"
- Skripsi: Syntax Analysis of Bahasa Indonesia using Link Grammar Parsing Algorithm and ANALISIS SINTAKSIS BAHASA INDONESIA DENGAN ALGORITMA PENGURAI LINK GRAMMAR
- Using LG and WordNet on Travel Domain.ppt (www.public.asu.edu)
Some miscellaneous facts:
- Any categorical grammar can be easily converted to a link grammar; see section 6 of Daniel Sleator and Davy Temperley. 1993. "Parsing English with a Link Grammar." Third International Workshop on Parsing Technologies.
- Link grammars can be learned by performing a statistical analysis on a large corpus: see John Lafferty, Daniel Sleator, and Davy Temperley. 1992. "Grammatical Trigrams: A Probabilistic Model of Link Grammar." Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, October, 1992.
Recent Changes
Version 4.7.4 (13 February 2011) This version includes fixes for several bugs/crashes introduced in the previous version :-( It also includes changes that result in dramatic speed improvements on long sentences, with an average of 1.1x to 2.0x improvement on "typical" texts.
- Dramatic speed improvement for long sentences (hash table fix).
- Fix: crash on certain sentences, when fat-link parsing enabled.
- Fix: fatal error introduced in last version (combinatorial explosion).
- Fix: mem leak introduced in last version (combinatorial explosion).
- Assorted fixes to the Boolean SAT solver.
Version 4.7.3 (7 February 2011) This version fixes a bug where a "combinatorial explosion" resulted in no parses being printed.
- Fix: "level" as indefinite noun: "... at knee level".
- Enable max disjunct cost as a controllable parser parameter.
- Don't just give up on combinatorial explosion, show something!
Version 4.7.2 (6 January 2011) This version fixes a crash and an "almost infinite loop"; both might be hit when using the (now deprecated) fat-links, and certain overflow conditions are met.
- Fix: crash on certain sentences, when fat-link parsing enabled
- Fix: 3 corrupted nouns in words.n.1
- Fix: minimize cpu spin if constituent processing and-list overflows.
Version 4.7.1 (11 December 2010) This version fixes several bugs introduced by the conjunction-handling rework of 4.7.0. Several Windows-related compile fixes are included as well.
- Fix: dictionary: Add "x" as synonym for "times".
- Fix: compilation failure due to lack of termios support on Windows.
- Fix: if not set, then force-set locale to UTF-8 in the java library.
- Fix: Explicitly include <stddef.h> for Windows builds.
- Fix: "John imagines himself lost".
- Fix: predicative adjectives w/misc verbs: "You are driving me crazy"
- Fix: "judge innocent", etc.
- Fix: many, many conjoined present tense verbs.
- Fix: conjoined verbs with negations.
- Fix: "to" as post-nominal modifier: "the inability to laugh"
Version 4.7.0 (12 September 2010) This version introduces a major change to the way that conjunctions are handled. The parse output for conjunctions is not compatbile with the old output, but should be easier to work with, solving many problems. The rework results in a performance improvement of 1.3x to 2.7x, depending on the text.
- Fix: hunspell configuration on Fedora (bugtracker issue 47)
- Fix: 'turn' with adjective: "She turned him green" from wingedtachikoma
- Fix: comma-conjoined modifiers: "It tastes bitter, not sweet."
- Fix: conjoined question words: "When and where is the party?"
- Fix: recognize short, capitalized words (Los, La, etc.).
- Treat colon as synonym for is: "The answer: yes."
- Fix: begin with prepositions: "It all began in Chicago."
- Fix: "What does it come to?" and related.
- Fix: null infinitive: "I'd like to, I want to."
- Fix: "Because I said so."
- Fix: "sure" as preverbal adverb: "It sure is."
- Fix: Gerunds with determiners: "a running of the bulls"
- SJ link for conjoined nouns/noun phrases.
- Sort linkages according to whether fat linkage was used.
- Add flag to enable use of fat linkage during parsing. (Fat links now disabled by default).
- Add male/female gender tags to misc nouns.
- Fix: misc optionally transitive verbs: mix, paint, boot
- Fix: word order: "look about fearfully", "look fearfully about", around
- Fix: recognize simple fractions
- Fix: "is" with uncountable nouns: "there is blood on your hands"
- Fix: Roman numeral suffixes e.g. "Henry VIII"
- Fix: regression in dates followed by punctuation. "In the 1950s, ..."
- Fix: verbs drank, drunk are optionally transitive.
- Fix: regression: "all the X", X can be plural or mass.
- Fix: verbs paint, color may be ditranstive: "paint the car bright green"
Version 4.6.7 (16 April 2010)
- Fix: configure handling of --enable-hunspell (bugtracker issue 46)
- Fix: conjoined modifier phrases: "the black and white cat sleeps."
- Fix: MSWindows+MingGW compilation issue.
- Update msvc9 Microsoft Visual C 2008 project, from Boris Iordanov.
Version 4.6.6 (19 March 2010)
- Minor fixes to handling of units.
- Fix: MSWindows+Cygwin/MingGW compilation issues.
- Use pkgconfig for editline (from Vincent Untz)
- Fix: "I want it back."
- Additional titular prefixes, suffixes: Maj. Ph.D.
- Add code to detect dictionary version number.
- Fix: MS VC missing strdup() function.
- Add AutoIt3 bindings from JRowe
Version 4.6.5 (3 November 2009)
- Fix: Superlatives without preceding determiners ("... likes you best")
- Fix: Take more care in distinguishing mass and count nouns.
- Fix: Old bug w/relative clauses: Rw+ is optional, not mandatory.
- Provide tags identifying relative, superlative adjectives.
- Remove BioLG NUMBER-AND-UNIT handling, its been superceeded.
- Fix handling of parenthetical phrases/clauses.
- Fix: handling of capitalized first words ending in letter "s".
- Fix: support "filler-it" SF link for "It was reasoned that..."
- Fix: certain WH-word constructions: "I did not know why until recently"
- Fix: go: "there goes the greatest guy ever"
- Fix: opening coordinating conjunctions: "And you can also ..."
- Configurable Hunspell spell-checker dictionary location.
- Fix: Misc ordinal usage.
- Add support for aspell spell-checker.
Version 4.6.4 (11 October 2009)
- Restore nouns starting w/letters x-z, elided in version 4.5.9 ff.
- Add support for single-word interjections/exclamations!
- Fix: sometimes command line client fails to show all valid linkages.
- Misc fixes: such_that, upon, acted.v
- Fix: impersonal "be" linking to passive participle.
- Fix: handling of capitalized first words.
- Fix: duplication of certain parses involving transitive verbs.
Version 4.6.3 (4 October 2009)
- Fix compilation bug on FreeBSD.
- Fix: allow MX link to post-nominal ", to be ..., "
- Fix: add idiom "time and again"
- Fix: another BioLG regression in handling of possessives.
- Fix: handling of period at end of number at end of sentence.
- Fix: Capitalized words ending in s at start of sentence.
- Use corpus-statistics-based ranking by default, if available.
- Fix difficulties in build of corpus statistics module.
Version 4.6.2 (21 September 2009)
- Fix: "come across as authoritative".
- Improve Java location guessing in FreeBSD
- Fix for assert triggered by long sentences.
- Fix: long sequence of periods treated as unknown word.
- Add informational print showing dictionary location on startup.
- Remove duplicated {@MV+} in tend.v
- Automatically resize the display size to fit the current window size.
- Fix handling of punctuation at the end of a capitalized word.
- Fix misc verbs acting as adjectival modifiers: e.g. "given", "allied"
- Fix bug in BioLG code regarding the handling of possessives.
- Fix a (rare) crash in sentences with many conjunctions.
- Fix a crash involving long sequences of UTF8 punctuation marks.
Version 4.6.1 (31 August 2009)
- Stop printing annoying warning when !vars are used.
- Fix missing dict file units.2 problem
- Fix compilation bug on FreeBSD.
Version 4.6.0 (29 August 2009)
- Avoid used of bzero, add missing include directives (MacOSX problem)
- Reclassify a number of "medical" prepositions as adverbs.
- Add approx 100 adverbs & 300 adjectives.
- Add approx 250 verbs.
- Add approx 300 nouns.
- Add misc units.
- Add misc European connector words/patronymics.
- Reclassify 100's of transitive verbs as optionally-transitive.
- Add distinct tokenization step ("sentence_split") to public API. This last change forces the minor-version-number bump.
License
The Link Grammar license is essentially the BSD license. A copy of this license can be found below, and at the original author's CMU site
Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John Lafferty. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- The names "Link Grammar" and "Link Parser" must not be used to endorse or promote products derived from this software without prior written permission. To obtain permission, contact sleator@cs.cmu.edu
THIS SOFTWARE IS PROVIDED BY DANIEL SLEATOR, DAVID TEMPERLEY, JOHN LAFFERTY AND OTHER CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

![[Logo]](/gfx/swish-a.jpg)