Inari Listenmaa

Logo

CV · Blog · GitHub

12 December 2019

Using GF grammars from an external program

This post will show how to use GF grammars from an external program, and how to manipulate GF trees from that program. The topic is introduced in Lesson 7 of the tutorial, and I will cover parts that I find missing in the tutorial:

Not all things are missing from the tutorial per se, but they are explained in different places. In contrast, I aim to make this post as self-contained as possible. If you have already installed GF and the PGF library in the language of your choice, you can go directly to Embedding grammars.

It is also possible to embed GF grammars into C#, JavaScript/TypeScript and Java, but I will not cover them in this tutorial. This is enough of a complex choose-your-adventure already.

GF ecosystem

The relevant bits for embedding GF grammars to other programming languages are explained in the following.

GF: programming language & executable

A GF grammar consists of an abstract syntax and a number of concrete syntaxes. They live in files that end in .gf. The executable is called gf, and you can use it in two ways:

1) Run your grammar in a GF shell

Assuming that you have a file called LangEng.gf in the same directory, you can run the following command.

$ gf LangEng.gf
Lang> p -cat=Cl "I am an apple"
PredVP (UsePron i_Pron) (UseComp (CompCN (UseN apple_N)))
Lang> help
<lots of helpful output>

2) Compile your grammar into one of the various formats

$ gf -make LangEng.gf
linking ... OK
Writing Lang.pgf...

If you don’t specify anything else than -make, then you will get a .pgf file. (More on PGF in the next section.) You can get other formats using the flag -f:

$ gf -make -f haskell LangEng.gf
linking ... OK
Writing Lang.pgf...
Writing Lang.hs...

Lang.hs is a Haskell version of the abstract syntax. We’ll get back to it in the section about transforming GF trees.

You can find other arguments to -f if you run gf -h.

PGF: file format & library

A GF file is compiled into a Portable Grammar Format, shortened PGF. If we want to use a GF grammar from another program, most often we need to compile it into PGF first. (Sometimes we can skip the PGF level: see tutorial for compiling the grammar directly to JavaScript.)

PGF is also the name of a Haskell library, which contains functions for reading and manipulating PGF files.

How about if you don’t want to use Haskell? Not a problem, there’s another library called PGF, written in C. The recommended usage of the C library is through bindings to another programming language. Currently there are 4 options: Python, Java, C# and Haskell again (in which case it is called PGF2, to distinguish from the native Haskell library called PGF). In this post, we will use Python.

Installation of GF

If you haven’t installed GF yet, you most likely want to do it now. The options are: a) download a binary, b) install from Hackage, and c) compile from source.

I want to use Python

If you have Mac or Ubuntu, the easiest way is to download the binary. Python bindings are included in the binary.

If you don’t have Mac or Ubuntu, you can install GF in any way you like—see instructions on the download page—but it won’t include the Python bindings, so you will need to set them up separately.

I want to use Haskell

For any system, the easiest way is to install GF and the libraries from Hackage: type cabal install gf.

If you don’t (want to) have a system-wide GHC, you have two options:

  1. You want to only work with ready-made PGFs and never compile GF files yourself. Skip all the way to Embedding grammars, and run my tutorial using Stack—it downloads the PGF library for you! The instructions include compiling a GF file into PGF and Haskell file, but I have cheated and put them under version control just so that you can run this tutorial.
  2. You want to compile GF files into PGF. Unfortunately GF isn’t in Stackage, but you can install GF the executable from source using Stack. Clone the gf-core repository and type stack install.

Installation of the libraries

Now follows installation instructions for the PGF library in Python and Haskell. To follow this tutorial, it is enough to choose only one.

Installation in Python

Using the binary: PGF library is already installed

If you downloaded the Mac or Ubuntu binary of GF, then you should have the Python bindings already.

To test if you have the Python bindings, open a Python shell and type import pgf:

$ python
<information about your python>
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgf

If the import succeeeds, you have the library, and you can skip all the way to Embedding grammars.

Not using the binary: get PGF library from PyPI

When I wrote this blog post in 2019, this section used to be a multi-step hassle. I’m glad to inform you that since June 2020, the PGF library is in PyPI. So the installation step, if you’re not using the binary, is as follows:

pip install pgf

And that’s it. Make sure that you install it for the right Python—substitute with pip3 install gf or whichever version you want to use. Now you can continue to Embedding grammars.

If you have trouble with installing the PGF library, please open an issue at GF’s GitHub describing your setup, what steps you took and the output.

Installation in Haskell

If you want to use Haskell, the first question is which library to use, PGF or PGF2? Remember, PGF is a native Haskell library, and PGF2 is Haskell bindings to a C library. For this post, I chose PGF for three reasons:

  1. It’s installed by default if you get GF from Hackage or compile it from source.
  2. The API is better documented than PGF2.
  3. It works smoothly with the abstract syntax compiled into Haskell.

0) Check if it’s already installed

If you installed GF from Hackage (typing cabal install gf) or compiled it from source, then you should have the PGF library.

Open your ghci and type import PGF. If it succeeds, you have successfully installed the PGF library, and you can skip all the way to Embedding grammars.

If you have installed GF by other means, or you don’t want to have a system-wide GHC, read further.

1a) Use Stack

In my repository gf-embedded-grammars-tutorial, you’ll find a Stack file, which downloads all relevant libraries for you in an isolated location. Clone the repository and skip to Embedding grammars, where one of your first tasks is to run stack build.

1b) Non-stack options

Let’s see. Are you sure you don’t want to use Stack?

I dunno, I’ve never used it and I’m overwhelmed with all this new information, but I could give it a try

Yes, I know what Stack is and don’t want to use it

You didn't install GF from Hackage nor compile from source; you want to use Haskell, and  don't want to use Stack. You have ended up in the current branch of this choose-your-adventure, if you followed one of the red routes in this flowchart.

Stack newbie

Let me quote docs.haskellstack.org:

Stack is a cross-platform program for developing Haskell projects. It is aimed at Haskellers both new and experienced.

It features:

  • Installing GHC automatically, in an isolated location.
  • Installing packages needed for your project.
  • Building your project.

When you install a program with Stack, it will not affect your previous Haskell ecosystem in any way. The downside is that it will download another version of GHC and libraries, which takes more space, but this is a trade-off for guaranteeing reproducible builds. If you use Stack just once for this project, you can still keep using Cabal only for all other projects in the past and future. So unless disk space is absolutely critical, I recommend this option.

First, install Stack. This is a simple process involving running one command on your terminal. After that, the rest of the process involves one extra stack build and then typing stack run <program> instead of runghc <program>. If you want to run a ghci with the libraries that are installed locally, you need to write stack ghci instead of ghci. That’s pretty much the concrete noticeable differences that affect your daily life. If you want to learn more, you can read the documentation at docs.haskellstack.org.

If you decided to give Stack a try, you can skip to Embedding grammars. Otherwise, read on.

Seriously, no Stack please

If you haven’t installed GF: GOTO install GF and choose either from Hackage or compile from source.

If your current GF is the downloaded binary, you could do one of the following:

If something weird happens from having multiple GF installations, or anything else goes wrong, you can open an issue at GF’s GitHub.

Embedding grammars

From this point on, I assume that you have managed to install the PGF library for Python or Haskell. Again, you can choose to follow the instructions for Python or Haskell further in this post.

Python

Preliminaries

Static tutorial

The repository contains a Jupyter notebook named ReflTransfer.ipynb. It’s meant to be opened with Jupyter, but if you don’t have the possibility to install Jupyter on the machine you’re reading this, you can still view the notebook on GitHub, where it just looks like a standard non-interactive tutorial. Here’s the link: ReflTransfer.ipynb on GitHub.

Interactive tutorial

If you have a chance to use Jupyter on your own computer, I recommend it: you can modify the code and add new features. If you haven’t used Jupyter notebooks before, here’s a tutorial and installation instructions.

Once you have installed Jupyter, go to the main directory of my repository (i.e. the one called embedded-grammars-tutorial) and run the command jupyter notebook.

$ jupyter notebook

This will open your browser with the following view. Click the file ReflTransfer.ipynb.

Picture of Jupyter Notebook server, showing the file ReflTransfer.ipynb and others.

Now you can use the notebook as an interactive tutorial. You can modify anything in the cells or write new cells and run them.

The rest of this post will be about Haskell, so unless you want to learn how to embed grammars bilingually, you’re done now! Here’s the last jump in this post, to links.

Haskell

Preliminaries

The first steps are:

If you are not using Stack, you can ignore both the Stack and the Cabal files in the repository, just runghc ReflTransfer.hs will be enough later on.

PGF API

The PGF library is documented at Hackage. The standard GF tutorial lists some of the most important functions, if you want to see fewer things at once. I will explain the functions I use in my code, but once you’re familiar with the small examples from the GF tutorial and this tutorial, do browse the full API at Hackage!

Reading PGF files

Open a Haskell shell (e.g. ghci or stack ghci) and import the PGF library. Do this in the main directory of my repository, same where you compiled MiniLangEng.gf into PGF and Haskell files.

$ stack ghci
…
Ok, two modules loaded.
> import PGF

Now you can open MiniLang.pgf in the shell as follows.

PGF> gr <- readPGF "MiniLang.pgf"
PGF> :t gr
gr :: PGF
PGF> languages gr
[MiniLangEng]
PGF> categories gr
[A,AP,Adv,CN,Cl,Conj, … ,VP]

In order to parse or linearise, you need a concrete language as well. Here’s one way to do it:

PGF> let eng = head $ languages gr
PGF> parse gr eng (startCat gr) "I sleep"
[EApp (EFun UttS) (EApp (EApp (EFun UsePresCl) (EFun PPos)) (EApp (EApp (EFun PredVP) (EApp (EFun UsePron) (EFun i_Pron))) (EApp (EFun UseV) (EFun sleep_V))))]

If you want to see trees that look like from GF, you need to use showExpr from the PGF library, like this:

PGF> let trees = parse gr eng (startCat gr) "I sleep"
PGF> map (showExpr []) trees
["UttS (UsePresCl PPos (PredVP (UsePron i_Pron) (UseV sleep_V)))"]

Syntactic transfer

Before we go further into the technologies, let us have a concrete goal to keep it interesting! We want to do semantics-preserving syntactic transfer.

I added a function called ReflV2 into the good old miniresource abstract syntax. The enhanced miniresource is found in the tutorial repository, MiniGrammar.

UseV      : V   -> VP ;             -- sleep
ComplV2   : V2  -> NP -> VP ;       -- love it
ReflV2    : V2 -> VP ;              -- use itself
UseAP     : AP  -> VP ;             -- be small
AdvVP     : VP -> Adv -> VP ;       -- sleep here

And the implementation is in MiniGrammarEng.

ReflV2 v2 = {
  verb = verb2gverb v2 ;
  compl = table {
    Agr Sg Per1 => "myself" ;
    Agr Sg Per2 => "yourself" ;
    Agr Sg Per3 => "itself" ; -- simplification, no human referent
    Agr Pl Per1 => "ourselves" ;
    Agr Pl Per2 => "yourselves" ;
    Agr Pl Per3 => "themselves" }
} ;

Now what do we want to do: transform all sentences with the same subject and object into reflexive, otherwise leave sentence untouched. Some examples:

A program that does this modification is our goal. So far we have involved just the PGF library, for parsing and linearising. But the current goal involves more complex manipulation of the trees, and here we are going to introduce another way of interacting with the GF trees.

GF abstract syntax in Haskell

Remember the flag -f haskell when we compiled the GF grammar? It produced a file called MiniLang.hs, and now we are going to use that.

Why

So first of all, why do we do this? Our overall goal is to manipulate trees, and this is much simpler using pure Haskell datatypes, than using the PGF functions. I’m not even going to bother show how to do it in pure PGF expressions—check out the Python tutorial if you like your code awkward and type-unsafe.

Our goal is to go from PGF expressions to the GF abstract syntax in Haskell, do our transformations operating on the Haskell datatypes, and then go back to the PGF expressions.

What & How

Here’s (a sample of) how the Haskell module looks like:

data GCl = GPredVP GNP GVP

data GNP =
     GDetCN GDet GCN
   | GMassNP GCN
   | GUsePN GPN
   | GUsePron GPron

data GVP =
      GAdvVP GVP GAdv
    | GComplV2 GV2 GNP
    | GReflV2 GV2
    | GUseAP GAP
    | GUseV GV

And so on. If you’re familiar with the miniresource, you should recognise all these constructors—it’s a Haskell translation of the abstract syntax of MiniLang! Where in GF you had fun PredVP : NP -> VP -> Cl, in Haskell you have data GCl = GPredVP GNP GVP.

In addition, we have a way to relate these Haskell datatypes to the PGF of the same grammar that produced it. Here’s a type class Gf:

class Gf a where
  gf :: a -> Expr
  fg :: Expr -> a

The type Expr comes from the PGF library. In place of a, we will put the Haskell data types just defined, such as GAdv or GCl.

By making a datatype into an instance of the typeclass Gf, we need to provide a translation to and from the PGF datatype Expr. (I will skip the details here; you can see them in the generated MiniLang.hs file if you are interested.) Thanks to the functions gf and fg, we can now have a workflow as follows:

  1. Parse a string into an Expr, using the PGF library.
  2. Turn the PGF expression into a Haskell expression, using fg.
  3. Transform the Haskell expression into a new Haskell expression, using some function that you wrote yourself.
  4. Turn the new Haskell expression back into a PGF expression, using gf.
  5. Linearise the transformed PGF expression into a string, using the PGF library.

The program

Now let’s get back to the goal! We want to transform sentences with the same subject and object into reflexive, for example I like me -> I like myself. The first sentence is parsed as follows in the miniresource:

Tree for I see me.

I have highlighted the two arguments that are identical, and will trigger the change into reflexive.

The identical argument in question is UsePron i_Pron: there are two instances of the tree. Their most recent common ancestor is PredVP, which constructs a Cl. So we need to design a function that does the following:

  1. Pattern match a Cl:
    • Does it contain a ComplV2?
    • Is the ComplV2’s NP argument same as PredVP’s NP argument?
  2. If yes, change the ComplV2 into ReflV2.
  3. Return the new tree:

Tree for I see myself.

At this point, just go and see the actual Haskell program! The full code is found in ReflTransfer.hs, and the relevant parts are pasted below.

transfer :: Tree -> Tree
transfer = gf . toReflexive . fg

-- Wrapper for the more interesting trasfer functions.
-- Need this because Utt is the start category;
-- the strings we input are parsed as Utt by default.
toReflexive :: GUtt -> GUtt
toReflexive (GUttNP x) = GUttNP x -- NPs can't be made reflexive
toReflexive (GUttS s) = GUttS (toReflexiveS s)

-- Another layer of wrapper
toReflexiveS :: GS -> GS
toReflexiveS s = case s of
  GCoordS conj s1 s2 -> GCoordS conj (toReflexiveS s1) (toReflexiveS s2)
  GUsePresCl pol cl -> GUsePresCl pol (toReflexiveCl cl)

-- The relevant transfer function is Cl -> Cl
toReflexiveCl :: GCl -> GCl
toReflexiveCl cl@(GPredVP subj vp) = -- PredVP is the only constructor for Cl in the mini resource
  case vp of
    GComplV2 v2 obj
      -> if show subj == show obj -- GNP has no Eq instance, need to compare string
          then GPredVP subj (GReflV2 v2)
          else cl
    _ -> cl -- Any other way to form VP: keep it unchanged

Run the program

Now you can run the program, alternatively by runghc ReflTransfer.hs, or stack run ReflTransfer. This is what it should look like:

EITHER
  $ runghc ReflTransfer.hs
OR
  $ stack run ReflTransfer
Write your sentence here, I will transform it into reflexive, if it has the same subject and object.
Write quit to exit.
I see me
I see myself
a car
a car
John sleeps and the water drinks the water
John sleeps and the water drinks itself
quit
bye

If you are unable to repeat these steps, please let me know! This time it’s not a GF core issue, just an issue about my tutorial, so create an issue in gf-embedded-grammars-tutorial repository or email me.

Links

tags: gf