Back to my academic homepage

TreeBank Viewer

This is freely-available software for displaying and browsing treebanks.
It renders bracketed expressions as nicely-formatted trees.

It is fast and capable of handling large treebanks, e.g. the Penn TreeBank (PTB).

Now available for MacOS X (PPC and Intel), Windows XP and Linux (Debian-based and RedHat-based) platforms. (See download section here.)

It comes in two basic flavors:

(The viewer employs the same underlying tree renderer as used in the next release of PAPPI.)


Contents


Usage

Initialization

  1. Specify the files containing the sentences and Prolog trees.

    Use the appropriate button to bring up a file dialog box or type directly into the entry field.


    [Sentence File dialog box, i.e. "Sentence File" button has been pressed. File lu.lisp selected for loading.]

    Sentence file lu.lisp and Prolog tree file lu.pl for the Lasnik & Uriagereka (L&U) treebank are supplied with the distribution.


    [Prolog Tree File dialog box. File lu.pl selected for loading.]

    File Formats

    The format is one line per sentence and one line per tree.
    The number of lines for the sentence and tree files should be the same.
    Anything can be present in the sentence file. Each line is treated as a simple string for display.
    However, the tree file must be parse-able by the tree renderer.
    Each tree should occupy one line and be acceptable to Prolog.

    Format is:

    tree(Tree).

    where tree node Tree should be of form:

    n(NodeName,Child1,..,Childn)

    NodeName should be an acceptable Prolog atom.
    Atoms starting with an upper case letter should be quoted as follows, e.g. VP should be 'VP'
    Each child node Childi should either be an atom or (recursively) a tree node.

    Example:

    Prolog tree input for the sentence John slept

    tree(n('S',n('NP',n('NNP','John')),n('VP',n('VBD','slept')),n('.','.'))).
    
    Bikel parser output is in Lisp sexp format:
    (S (NP (NNP John)) (VP (VBD slept)) (. .))
    

  2. Both sentence and tree files must be entered.

    Press "Load" to load the files into the viewer.


Tree Display

Click on any displayed sentence in the left display panel and the corresponding tree will be rendered on the right panel.

The background of the sentence currently being displayed is highlighted in blue. The sentence number is given above the tree.

In addition to directly clicking on a sentence, when the window focus is on the left display panel, the Up and Down arrows on the keyboard can be used to display the tree for the preceding and following sentence.

To go directly to a sentence, enter the sentence number in the sentence number box and press Return. Example:

Screen and window sizing

Scrollbars are available when appropriate in both display windows.
If a scrollbar is not visible, expand the window.

The entire program window can be expanded or re-sized by dragging the handle at the bottom-right.
The divider separating the two display windows can be moved using the (small square) drag handle.

[Note: the right display window below has been resized to accommodate the large parse tree. The vertical scrollbar for the left display window has been occluded.]


Further Examples

Here are more examples of the viewer. Some of the images are from the Penn Treebank version.

[Click for a larger image.]

This is the output of Dan Bikel's parser in Collins emulation mode for the Lasnik and Uriagereka sentences from the PAPPI distribution.

The current tree highlighted is for the parasitic gap sentence Which report did you file without reading.

The next two screenshots show the viewers on examples from the Wall Street Journal (WSJ) section of the Penn TreeBank (PTB).
Sentence File: wsj.txt, Prolog Tree File: wsj.pl
Note: All 49208 sentence/parse pairs have been loaded into the viewer.


[Click for a larger image.]

This is the Linux implementation running the 1st sentence in the WSJ section of the PTB. The current tree highlighted is for the familiar sentence Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 . .


[Click for a larger image.]

This is the MacOSX implementation on a randomly picked sentence in the WSJ section.

[Click for a larger image.]

Finally, here is a snapshot from the Windows XP version of the viewer on the PTB.


Working with tgrep2

In this section, I assume Doug Rohde's treebank search program called tgrep2 has been installed.

[Doug Rohde's tgrep2 program is available here.]

I also assume the WSJ section of the PTB has been loaded into the viewer.

Example query (taken from http://www.ldc.upenn.edu/ldc/online/treebank/):

tgrep2 -c wsj2.t2c 'VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to))'
    
[Here, wsj2.t2c is the pre-processed index file for the WSJ produced by tgrep -p.]

Normal tgrep2 output for this query is fairly difficult to read:

(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP We)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ (PRP$ our) (NN decision) (S (NP-SBJ (-NONE- *)) (VP (TO to) (VP (VB plead) (-LRB- -LRB-) (NP (JJ guilty)) (-RRB- -RRB-) (PP-CLR (TO to) (NP (DT these) (NNS charges))))))) (VP (VBZ is) (ADJP-PRD (JJ responsible) (CC and) (JJ proper))))))))
(VP (VBP lead) (S (NP-SBJ (NNS readers)) (VP (TO to) (VP (VB believe) (SBAR (IN that) (S (NP-SBJ (DT the) (NNP House)) (VP (VBD reduced) (NP (DT the) (NNS capital-gains) (NN tax)) (PP-TMP (IN for) (NP (CD two) (NNS years) (RB only))))))))))
(VP (VBD said) (, ,) (`` ``) (S (PP (VBN Given) (NP (NP (DT the) (NN state) (POS 's)) (JJ strong) (NN bargaining) (NN position))) (: ...) (NP-SBJ (PRP we)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ (DT the) (NNP NU) (NN plan)) (VP (VBZ provides) (NP (NP (DT the) (JJS best) (NN recovery)) (ADJP (JJ available) ('' '') (PP (TO to) (NP (NP (NAC (NNP PS) (PP (IN of) (NP (NNP New) (NNP Hampshire)))) (POS 's)) (NN equity) (NNS holders)))))))))))
(VP (VBG making) (S (NP-SBJ (NNS traders)) (VP (VB believe) (SBAR (-NONE- 0) (S (NP-SBJ (DT the) (NN market)) (VP (VBD was) (ADVP-PRD (RB back) (PP (TO to) (NP (JJ normal))))))))))
(VP (VBZ expects) (S (NP-SBJ (DT the) (NN deflator)) (VP (TO to) (VP (VB rise) (NP-EXT (NP (CD 3.7) (NN %)) (, ,) (PP (ADVP (RB well)) (IN below) (NP (NP (DT the) (JJ second) (NN quarter) (POS 's)) (CD 4.6) (NN %))) (, ,))))) (PP-PRP (ADVP (RB partly)) (IN because) (IN of) (SBAR-NOM (WHNP-1 (WP what)) (S (NP-SBJ (PRP he)) (VP (VBZ believes) (SBAR (-NONE- 0) (S (NP-SBJ (-NONE- *T*-1)) (VP (MD will) (VP (VB be) (NP-PRD (ADJP (RB temporarily) (JJR better)) (NN price) (NN behavior)))))))))))
(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP We)) (VP (VBP believe) (SBAR (-NONE- 0) (S (NP-SBJ-1 (NP (DT the) (NN partnership)) (PP (IN of) (NP (NP (NNP Fox)) (, ,) (NP (PRP$ its) (NNS affiliates)) (CC and) (NP (NNS advertisers))))) (VP (VP (VBZ is) (VP (VBG succeeding))) (CC and) (VP (MD will) (VP (VB continue) (S (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB grow))))))))))))
(VP (VBZ believes) (S (NP-SBJ (NP (DT the) (JJ legal) (NN action)) (PP (IN by) (NP (DT the) (JJ British) (NN firm)))) (`` ``) (VP (TO to) (VP (VB be) (PP-PRD (IN without) (NP (NN merit)))))))
(VP (VBD said) (, ,) (`` ``) (S (NP-SBJ (PRP It)) (VP (VBZ indicates) (ADVP (RB perhaps)) (SBAR (IN that) (S (NP-SBJ (NP (DT the) (NN balance)) (PP-LOC (IN in) (NP (DT the) (NNP U.S.) (NN economy)))) (VP (VBZ is) (RB not) (ADJP-PRD (ADJP (RB as) (JJ good)) (SBAR (IN as) (S (NP-SBJ-2 (PRP we)) (VP (VBP 've) (VP (VBN been) (VP (VBN led) (S (NP-SBJ (-NONE- *-2)) (VP (TO to) (VP (VB believe))))))))))))))))
(VP (VBD said) (, ,) (S (`` ``) (NP-SBJ-1 (PRP We)) (VP (VBP continue) (S (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB believe) (SBAR (SBAR (-NONE- 0) (S (NP-SBJ (PRP$ our) (NN approach)) (VP (VBZ is) (ADJP-PRD (JJ sound))))) (, ,) (CC and) (SBAR (IN that) (S (NP-SBJ (PRP it)) (VP (VBZ is) (ADJP-PRD (ADJP (RB far) (JJR better)) (PP (IN for) (NP (DT all) (NNS employees))) (PP (IN than) (NP (NP (DT the) (NN alternative)) (PP (IN of) (S-NOM (NP-SBJ (-NONE- *)) (VP (VBG having) (S (NP-SBJ (DT an) (NN outsider)) (VP (VB own) (NP (DT the) (NN company)) (PP (IN with) (S-NOM (NP-SBJ (NNS employees)) (VP (VBG paying) (PP-CLR (IN for) (NP (PRP it))) (ADVP (RB just) (DT the) (JJ same)))))))))))))))))))))))
(VP (VBP believe) (S (NP-SBJ (PRP themselves)) (VP (TO to) (VP (VB be) (VP (VBG serving))))))

Invoking the query with the -x flag, will give us sentence number (and VP node number) output.

tgrep2 -x -c wsj2.t2c 'VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to))'
5175:26
14103:6
21204:27
29432:68
29570:39
33275:25
33836:45
39564:61
42224:18
48195:9
These tree numbers can be copied and pasted into the treebankviewer. For example:

Tree number 33836 is displayed:


[Click for a larger image.]

Some platforms accept drag-and-drop of the tree numbers.

(No cut-and-paste or drag-and-drop is necessary for treebanksearch. That engine has a direct interface to tgrep2 and will narrow the sentence display on the left panel automatically.)


Download

Application

Platform File Install/Run
MacOS X (PowerPC)
(10.3, 10.4)
treebankviewer-powerpc.zip (1.2MB) Updated: 1/24/07

Note: Requires Aqua Tcl/Tk (10.4: already installed by default, 10.3: download from http://tcltkaqua.sourceforge.net/)

(Unzip if necessary.)
Drag application to your Application folder.
Double-click application.
MacOS X (Intel)
(10.4)
treebankviewer-intel.zip (1199KB) Updated: 1/28/07

Note: There is no need to install additional software.
The application works with the standard 10.4 Aqua Tcl/Tk libraries supplied by Apple in /Library/Frameworks/ in PowerPC (Rosetta emulation) mode.

A universal binary (ActiveTcl) is available from http://www.activestate.com/Products/ActiveTcl/

Release Note: 1/28/07 version adds slave mode operation.
1/21/07 version adds sentence number goto and a tree zoom control.

Application executable references /Library/Frameworks/Tk.framework/Versions/8.4/Tk
File should report:
Tk: Mach-O universal binary with 2 architectures
Tk (for architecture ppc):  Mach-O dynamically linked shared library ppc
Tk (for architecture i386): Mach-O dynamically linked shared library i386
(Unzip if necessary.)
Drag application to your Application folder.
Double-click application.

Platform File Install/Run
Linux (Intel)
(Debian-based and RedHat-based)
treebankviewer-linux.tar.gz (1161KB) Updated: 1/26/07

Note: The viewer was compiled on a Ubuntu 6.06 system. Major library dependencies: Tcl/Tk. Further information: ldd viewer reports:

      linux-gate.so.1 =>  (0xffffe000)
      libtk8.4.so.0 => /usr/lib/libtk8.4.so.0 (0xb7efb000)
      libtcl8.4.so.0 => /usr/lib/libtcl8.4.so.0 (0xb7e4d000)
      libSM.so.6 => /usr/lib/libSM.so.6 (0xb7e45000)
      libICE.so.6 => /usr/lib/libICE.so.6 (0xb7e2d000)
      libX11.so.6 => /usr/lib/libX11.so.6 (0xb7d47000)
      libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7d44000)
      libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7d22000)
      libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7d0f000)
      libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7be0000)
      libXau.so.6 => /usr/lib/libXau.so.6 (0xb7bdd000)
      /lib/ld-linux.so.2 (0xb7fe1000)
      
For those on a Redhat-based system, substitute the following executable:

viewer.gz (1100KB, .gz file)

Compiled on Red Hat Enterprise Linux AS release 4 (Nahant Update 4), ldd viewer reports:

        libtk8.4.so => /usr/lib/libtk8.4.so (0x001ec000)
        libtcl8.4.so => /usr/lib/libtcl8.4.so (0x00142000)
        libSM.so.6 => /usr/X11R6/lib/libSM.so.6 (0x00101000)
        libICE.so.6 => /usr/X11R6/lib/libICE.so.6 (0x0010c000)
        libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x00ce7000)
        libdl.so.2 => /lib/libdl.so.2 (0x00ce1000)
        libm.so.6 => /lib/tls/libm.so.6 (0x00cbc000)
        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00dea000)
        libc.so.6 => /lib/tls/libc.so.6 (0x00b8f000)
        /lib/ld-linux.so.2 (0x00b71000)
      
(Gunzip and untar.)
Go into extracted treebankviewer directory.
Run program using ./viewer

Platform File Install/Run
Windows XP treebankviewer-winxp.zip (968KB) Updated: 1/21/07
[Version compiled sans SP1 on Visual C++ 2005 Express Edition. SP1 breaks the code.]
Release note: bug fix.

Note: This relies on ActiveTCL and Microsoft Visual C++ DLLs.

You need to install ActiveTCL for Windows XP. Download from http://downloads.activestate.com/ActiveTcl/Windows/.

Sicstus Prolog 3.12.7 was built against TCL/TK version 8.4.13 To avoid DLL hell, you probably should install this exact version (not the very latest release) from the above URL. In other words, download: 8.4.13/ActiveTcl8.4.13.0.261555-win32-ix86-threaded.exe

When installing ActiveTCL, choose the exact same directory used in the Sicstus build. This means not installing in the default directory (C:\Tcl), but C:\Tcl-8.4.13, see the configuration screen to the right.


[Click on the picture for a larger image.]

(Unzip the treebankviewer folder.)
Place folder in C:\Program Files

To run, double-click the executable tbv.exe in the C:\Program Files\treebankviewer.

[Do not double-click treeview.tcl. It will start the viewer but you will not be able display any trees.]


Lasnik & Uriagereka TreeBank

This example treebank is a free download.

Download Sentence File Prolog TreeBank
lu.zip (.zip archive) lu.lisp (POS-tagged sexps for Bikel-Collins) lu.pl (Generated by Bikel-Collins)

Penn Treebank

This example treebank is a restricted download.

Download Sentence File Prolog TreeBank
wsj.zip (14.5MB, .zip archive) wsj.txt (WSJ sentences, not tagged) wsj.pl (WSJ PTB trees in Prolog format)


Back to my academic homepage
Last modified: Sun May 20 21:34:31 MST 2007