Inari Listenmaa

Logo

CV · Blog · GitHub

25 January 2024

Using GF grammars from Haskell: advanced options

This is a sequel to the 2019 post on using GF grammar from an external program. In that post, I covered the basics:

In this post, I will only concentrate on Haskell. If anyone wants a post on more advanced things in Python, let me know and maybe I’ll write one in 5 years.

Lexical functions under single constructor per type

First the simplest thing I didn’t explain in the first post. The following flags are added to the gf command:

--haskell=lexical --lexical=[GF categories]

A full command could look like this:

gf -make \
   -f haskell \
   --haskell=lexical \
   --lexical=A,N \
   YourGrammar.gf

Suppose that the original grammar has the following abstract syntax functions:

fun
  small_A : A ;
  big_A : A ;
  cat_N : N ;
  dog_N : N ;
  -- … and more lexicon of types A and N

Without lexical flags

The resulting Haskell module, compiled without the lexical flags, would look as follows. Every lexical function becomes a constructor of the data type:

data GA =
   Gsmall_A
 | Gbig_A
 | 
  deriving Show

data GN =
   Gcat_N
 | Gdog_N
 | 
  deriving Show

With lexical flags

Now compare with the Haskell module compiled with the lexical flags:

data GA =
   LexA String
  deriving Show

data GN =
   LexN String
  deriving Show

Now for each GF category C specified in the --lexical=[C] here is a single constructor LexC that takes a String as its argument.

If you tried to give a category that has no 0-place functions as an argument to --lexical=, it would just do nothing. And if a category has both 0-place and ≥1-place functions, then only the 0-place functions are replaced by the LexC constructor—see for instance Adv in the RGL:

data GAdv =
   GPrepNP GPrep GNP -- in the house
 | LexAdv String     -- tomorrow
  deriving Show

If you are working on the full RGL, the list of categories with 0-place functions can be quite large. I usually grep my whole computer for --haskell=lexical --lexical= and copy whatever list I find. Here’s one I just found:

N,N2,N3,A,A2,A3,V,V2,V3,VA,V2V,VV,VS,V2A,V2S,V2Q,Adv,AdV,AdA,AdN,ACard,CAdv,Conj,Interj,PN,Prep,Pron,Quant,Det,Card,Predet,Subj

The two Haskell libraries: PGF vs. PGF2

PGF is a native Haskell library, and PGF2 is Haskell bindings to a C library. This section summarizes the main differences between them.

PGF

The library called PGF is:

Pros:

Cons:

PGF2

The library called PGF2 is:

Pros:

Cons:

Installation of C runtime

The PGF2 library in Hackage doesn’t come with the C library, so you need to install the C library yourself. (The Python bindings manage this with a Python wheel, so in Python there’s no need to manually compile any C libraries.)

The C library is installed in the location /usr/local/lib, and that path may need to be specified in a few different places. I have put the following in the stack.yaml file in my repository:

extra-lib-dirs:
- /usr/local/lib
ghc-options:
  "$locals": -optl=-Wl,-rpath,/usr/local/lib

Depending on your system, you might need 0-2 of these options in your stack.yaml.

Manually install from source

The steps to do this are as follows:

$ git clone https://github.com/GrammaticalFramework/gf-core.git
$ cd gf-core/src/runtime/c
$ cat INSTALL
# … follow the instructions in that file for your system!

Here are the instructions if you want to look at them: github.com/GrammaticalFramework/gf-core/blob/master/src/runtime/c/INSTALL

Alternative: use Docker

I have prepared a Dockerfile which you can use to try out the example in my tutorial on GitHub. You can download the file anywhere on your computer, and follow the instructions below:

$ docker build -t pgf2 .

This will take a long time. Once it’s done, you should see output like following.

 => [10/10] RUN stack build --extra-include-dirs=/usr/local/lib --extra-lib-dirs=/usr/local/lib
 => exporting to image
 => => exporting layers
 => => writing image sha256:81712ef079ace0d888e94a56d64517c65156db625f10bbc873d9432eacab87f9
 => => naming to docker.io/library/pgf2

Now you can run the demo in the Docker image:

$ docker run -it pgf2 bash
root@8472aab978a6:/app/gf-embedded-grammars-tutorial/advanced-pgf2# stack run

PGF2 with Haskell version of abstract syntax

To get the Haskell version of the abstract syntax to conform to PGF2, you need to use the flag --haskell=pgf2. A full example would look like this:

gf -make \
   -f haskell \
   --haskell=pgf2 \ # As many --haskell=… flags as you want
   --haskell=lexical --lexical=A,N,V,… \
   YourGrammar.gf

You can read about the differences in inariksit/gf-embedded-grammars-tutorial/tree/master/advanced-pgf2. This is also the example that is used in the Dockerfile above. I won’t repeat the information here, so just read the README and inspect the code, which you can run in Docker or without.

GADTs

The next flag that can be added to the usual command is --haskell=gadt. An example of a full command looks like this:

gf -make \
   -f haskell \
   --haskell=gadt \
   --haskell=lexical --lexical=A,Adv,N,V,… \
   YourGrammar.gf

Adding the flag --haskell=gadt creates a Haskell module where the full GF abstract syntax is represented under a single Haskell data type—a GADT. Earlier in this post, we saw how the GF types A and N were converted into two different Haskell types. Now everything is part of the same type:

data Tree :: * -> * where
  LexA    :: String -> Tree GA_
  GPositA :: GA -> Tree GAP_
  GPrepNP :: GPrep -> GNP -> Tree GAdv_
  LexAdv  :: String -> Tree GAdv_
  
  GUseV   :: GV -> Tree GVP_
  GString :: String -> Tree GString_
  GInt    :: Int -> Tree GInt_
  GFloat  :: Double -> Tree GFloat_

The constructors of that data type can take any number of arguments, corresponding to the original GF function. As before, if the GF category C is specified as an argument to the --lexical= flag, then its constructor in the GADT is called LexC and it takes a string. Otherwise, the constructors take other GF categories as arguments.1

This design gives some major benefits in doing tree transformations. If you read the previous post, you see these layers of wrappers before we get to modify the actual subtree that we want to modify;

-- first layer of wrapper
toReflexive :: GUtt -> GUtt
toReflexive (GUttS s) = GUttS (toReflexiveS s)
-- … other cases leave the Utt intact

-- second layer of wrapper
toReflexiveS :: GS -> GS
toReflexiveS (GUsePresCl pol cl) = GUsePresCl pol (toReflexiveCl cl)
-- …

-- The relevant transfer function is Cl -> Cl
toReflexiveCl :: GCl -> GCl
-- here happens the actual transformation

The further inside the start category we want to modify, the more awkward wrapper functions we need to write. Now contrast the previous function to the GADT version:

-- Transform a subtree, keep rest of the tree intact
toReflexive :: forall a . Tree a -> Tree a
toReflexive tree = case tree of
  -- If argument tree matches, do the transformation
  GPredVP subj (GComplV2 v2 obj) ->
    if isSame subj obj
      then GPredVP subj (GReflV2 v2)
      else tree

  -- If argument tree doesn't match, apply toReflexive to all subtrees
  _ -> composOp toReflexive tree

Here we only needed to write a single function, with type signature Tree a -> Tree a, and only write code for the actual case we want to transform. Then that same function is applied to all the subtrees with composOp.

Transform a subtree within a larger tree: composOp

The function composOp is included in the Haskell module generated with --haskell=gadt. You can inspect the code here, but if that doesn’t say much to you, don’t worry. There are simple patterns that you can adopt in your GF grammar without understanding the internals.

composOp is used for patterns where you want to modify an existing tree. So the pattern goes like this:

transformTree :: forall a . Tree a -> Tree a
transformTree tree = case tree of
  -- If argument tree matches, transform it
  SimpleSubtree -> AnotherSimpleSubtree
  ComplexSubtree a1 a2 -> ComplexSubtree a1 ConstantArg2
  Foo foo -> Bar Baz (computeResultFrom foo)
  

  -- Otherwise, try to apply transformTree to all subtrees
  _ -> composOp transformTree tree

And this works exactly because both the larger tree and its subtrees are of the same type: Tree a.

Note that you do need to transform the tree into the same type: if SimpleSubtree and AnotherSimpleSubtree are of different types, you will get an error. But SimpleSubtree, ComplexSubtree … … and Foo … can all be different types.

Extract subtree(s) from a larger tree: composOpMonoid

Another useful pattern is to extract something from a tree. This is the example from my tutorial:

-- If argument is a lexical function, return the String in a list
-- If argument doesn't match, apply getLex to all subtrees
getLex :: forall a . Tree a -> [String]
getLex tree = case tree of
  LexA    s -> [s]
  LexDet  s -> [s]
  LexN    s -> [s]
  LexPN   s -> [s]
  LexPrep s -> [s]
  LexPron s -> [s]
  LexV    s -> [s]
  LexV2   s -> [s]
  x -> composOpMonoid getLex x

Let’s take an example tree

-- the small cat sees a big dog
PredVP
    ( DetCN the_Det
        (  AdjCN (  PositA small_A  ) (  UseN cat_N  )  )
    )
    ( ComplV2 see_V2
        ( DetCN a_Det
            (  AdjCN (  PositA big_A  ) (  UseN dog_N  )  )
        )
    )

Applying getLex on that tree gives us a list of all the lexical functions as Strings:

[the_Det,  small_A, cat_N, see_V2, a_Det, big_A, dog_N]

The return type doesn’t have to be a list, it can be any monoid. Haskell just has to know how to <> together the two values, since the function is applied to the whole tree, and there are potentially multiple subtrees that match the extraction condition.

Read more

Footnotes

  1. The types like GAdv and GAdv_ are just dummy types generated automatically earlier in the Haskell module. You may use them as convenience in your own functions, but under the hood, everything is just a single datatype Tree a. For instance, the type signature GV -> Tree GVP_ just desugars into Tree GV_ -> Tree GVP_

tags: gf