Link Grammar Parser

by Davy Temperley, John Lafferty and Daniel Sleator
(this variant maintained by Dom Lachowicz - <domlachowicz@gmail.com> and Linas Vepstas - <linasvepstas@gmail.com> )

News

April, 2010: link-grammar 4.6.7 released! See below for a description of recent changes.

What is the Link Grammar?

The Link Grammar Parser is a syntactic parser of English (and other languages as well), based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a "constituent" (Penn tree-bank style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.). The RelEx extension provides dependency-parse output.

Did the AbiWord team write Link Grammar?

In large part, no. The project is the brainchild of Davy Temperley, John Lafferty and Daniel Sleator, all university professors. It is the product of a decade of academic research into grammar, and is founded on a theory backed by numerous publications. Its canonical homepage is hosted by Carnegie Mellon University.

So, then what is it doing @ AbiSource.com?

The AbiWord team had a concrete need - to integrate a grammar checking feature into AbiWord. The best choice, they felt, was to build upon Temperley et. al.'s successful Link Grammar project.

However, in order for the link-grammar project to be useful to them and to the greater Free Software world, the AbiWord community felt that a variety of changes to the project would be necessary. While they did have success (a few years ago) convincing the authors to release Link Grammar under a GPL-compatible license, there was no practical way to continue project development and maintenance at the CMU website. So the AbiWord community took it under its wing and has nurtured the project since.

Ongoing development by OpenCog

Ongoing development of link-grammar is being primarily guided by the Open Cognition project, where the parser plays an important role in the OpenCog natural language processing subsystem. Research and implementation is ongoing; current work includes investigations into statistically guided parse ranking, grammatically induced word-sense disambiguation using statistical results from the Mihalcea all-words WSD algorithm, and work on automatically learning new parse rules based on corpus statistics.

A sibling project, RelEx, uses constraint-grammar-like techniques to extract dependency relations and assorted additional linguistic information, including FrameNet-style framing and reference (anaphora) resolution. The dependency output is similar to that of the Stanford parser. It's performance is comparable to the Stanford PCFG parsing model, and is more than three times faster than the Stanford "lexicalized" (factored) model.

The NLGen and NLGen2 projects provide natural language generation modules, based on, and compatible with link-grammar and RelEx. They implement the SegSim ideas for NL generation. See the following NLGen demos: Demo of Virtual Dog Learning to Play Fetch via Imitation and Reinforcement, AI Virtual Dog's Emotions Fluctuate Based on Its Experiences, Demo of Embodied Anaphora Resolution and AI Virtual Dog Answers Simple Questions about Itself and Its Environment.

Notable changes from the upstream Link Grammar package include:


Downloading Link Grammar

The system can be downloaded either as a tarball, or via SVN. The current stable version is Link Grammar 4.6.7 (April, 2010). Older versions are available here.

Unstable, development versions are available through AbiWord's SVN repository. Anonymous read-only access is available by issuing the command:

svn co http://svn.abisource.com/link-grammar/trunk link-grammar

General instructions for AbiWord's anonymous SVN can be found here.

The Link Grammar source can be browsed online here.

Documentation

One of the best ways to obtain a solid, easy-to-understand overview of the parser is to review the original papers describing it, here, here, here and here. There is an extensive set of pages documenting the dictionary; specifically, the names of links and their meanings, as well as how to write new rules. There is also a short primer for creating dictionaries for new languages. The documentation for the programming API is here. Documentation for additions made in the 4.0 release is on the improvements page. A fairly comprehensive bibliography of papers written before 2004 is here.

Mailing Lists

The current list for Link Grammar discussion is at the link-grammar google group.

Subscribe to link-grammar:

Enter email:

Bug Tracker

Bug reports, patches, RFEs, etc. are gladly welcomed.

Disclaimer

Link grammar is a natural language parser, not an artificial intelligence. This means that there are many sentences that it cannot parse correctly, and many others for which it generates multiple parses. There are also entire classes of speech that it cannot parse, such as Valley-girl speak. Link grammar does best on "newspaper English": medium-length sentences written with good grammar, proper punctuation, and proper capitalization. It don't do 733t speek, etc. In particular, it has problems with the following "registers" and types of writing:

In addition, it has a variety of "bugs": it currently has trouble with "if...then..." constructs, compound queries ("who did it, and why?"), lists, "...not only...but also..." constructs, certain types of idiomatic phrases, certain types of "institutional utterances", and so on. The goal of the project is to eventually fix all of these cases; progress is ongoing.


Adjunct Projects

RelEx Semantic Relation Extractor
RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It will also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. RelEx includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection.
Ruby bindings
There are two different packages providing Ruby bindings: Ruby Link Grammar, which is up-to-date and currently maintained, and Link Grammar 4 Ruby, which is wildly out-of-date (its for version 4.2.2) and is unmaintained. You only need one!
Python bindings
New python bindings are in development. Development snapshots are available on Launchpad. Install instructions here.
Perl bindings
The perl bindings, created by Danny Brian, have been updated. See the Lingua-LinkParser page on CPAN. There is also a tutorial written against an older version of the bindings; some details may be different.
Objective Caml bindings
OCaml interface to Link Grammar
AutoIt bindings (New!)
AutoIt is a scripting language for Windows. This zipfile provides AutoIt bindings to Link Grammar, thanks to JRowe. Includes binary window DLL's for a recent link-grammar version, as well.
.Net Framework bindings
.Net interface to Link Grammar from Leonard Chalk/ProAI.
Alternative Java bindings
Another, completely different set of Java bindings have been developed: a tar ball is here. These are for the old version 4.1 only. Note that these are not compatible with the bindings that ship, by default, with the main link-grammar package.
Persian dictionaries
Persian dictionaries, by Jon Dehdari. These require the Persian stemming engine, as significant morphology analysis needs to be performed to parse Persian.
Arabic dictionaries
Arabic dictionaries, by Jon Dehdari. [download] These require the Aramorph stemming package, which is included.
French dictionary, Luthor
The Luthor project aims to develop a set of scripts to automatically construct Link Grammar linkage dictionaries by mining Wiktionary data. Current efforts are focusing on French.
Russian parser
Located at http://slashzone.ru/parser/. By Sergey Protasov. Includes link documentation and subscript (morphology) documentation. Russian morpheme dictionaries can be had at http://aot.ru.
English dictionary extensions
LinkGrammar-WN is a lexicon expansion for the English language Link Grammar Parser. This project adds 14K new words to the dictionaries. The extended lexicon is provided under the GPL license, and thus cannot be merged back into the current project.
Medical Text Analysis
The MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Clinical Decision Making Group has done work to extend the Link Grammar dictionaries by adding many new words. All but the six largest of these dictionaries have been merged into link-grammar, since version 4.3.1. The large dictionaries EXTRA.2, EXTRA.3, EXTRA.8, EXTRA.9, EXTRA.12, and EXTRA.17 have not been merged. These dictionaries contain 180K assorted medical, biological and biochemical terms and phrases.
BioLG
The BioLG project is a modification of the Link Grammar Parser adapted for the biomedical domain, as described in Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches (Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko; BMC Bioinformatics 2006). Almost all of the BioLG changes have been merged back into the main line, as of version 4.5.0 (April 2009), with scattered bug-fixes after that.

Of related interest

Genia tagger
The Genia tagger is useful for named entity extraction. BSD license source.

Recent Applications and Publications

Some recent uses and applications of the Link Grammar Parser are shown below. There is also an older bibliography on the CMU website (mirror) referencing several dozen papers pertaining to the Link Grammar Parser.

Some miscellaneous facts:


Recent Changes

Version 4.6.7 (16 April 2010)

Version 4.6.6 (19 March 2010)

Version 4.6.5 (3 November 2009)

Version 4.6.4 (11 October 2009)

Version 4.6.3 (4 October 2009)

Version 4.6.2 (21 September 2009)

Version 4.6.1 (31 August 2009)

Version 4.6.0 (29 August 2009)

Version 4.5.10 (25 August 2009)

Version 4.5.9 (25 August 2009)

Version 4.5.8 (2 July 2009) includes the following changes:

Version 4.5.7 (4 June 2009) includes the following changes:

Version 4.5.6 (24 May 2009) includes the following changes:

Version 4.5.5 (10 May 2009) includes the following changes:

Version 4.5.4 (9 May 2009) includes the following changes:

Version 4.5.3 (14 April 2009) includes the following changes:

Version 4.5.2 (14 April 2009) includes the following changes:

Version 4.5.1 (13 April 2009) includes the following changes:

Version 4.5.0 (10 April 2009) includes the following changes:

A summary of older changes can be found here.

License

The Link Grammar license is essentially the BSD license. A copy of this license can be found below, and at the original author's CMU site

Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John Lafferty. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. The names "Link Grammar" and "Link Parser" must not be used to endorse or promote products derived from this software without prior written permission. To obtain permission, contact sleator@cs.cmu.edu

THIS SOFTWARE IS PROVIDED BY DANIEL SLEATOR, DAVID TEMPERLEY, JOHN LAFFERTY AND OTHER CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.