This is a sequel to the 2019 post on using GF grammar from an external program. In that post, I covered the basics:
In this post, I will only concentrate on Haskell. If anyone wants a post on more advanced things in Python, let me know and maybe I’ll write one in 5 years.
First the simplest thing I didn’t explain in the first post. The following flags are added to the gf
command:
--haskell=lexical --lexical=[GF categories]
A full command could look like this:
gf -make \
-f haskell \
--haskell=lexical \
--lexical=A,N \
YourGrammar.gf
Suppose that the original grammar has the following abstract syntax functions:
fun
small_A : A ;
big_A : A ;
cat_N : N ;
dog_N : N ;
-- … and more lexicon of types A and N
lexical
flagsThe resulting Haskell module, compiled without the lexical
flags, would look as follows. Every lexical function becomes a constructor of the data type:
data GA =
Gsmall_A
| Gbig_A
| …
deriving Show
data GN =
Gcat_N
| Gdog_N
| …
deriving Show
lexical
flagsNow compare with the Haskell module compiled with the lexical
flags:
data GA =
LexA String
deriving Show
data GN =
LexN String
deriving Show
Now for each GF category C
specified in the --lexical=[C]
here is a single constructor LexC
that takes a String as its argument.
If you tried to give a category that has no 0-place functions as an argument to --lexical=
, it would just do nothing. And if a category has both 0-place and ≥1-place functions, then only the 0-place functions are replaced by the LexC
constructor—see for instance Adv
in the RGL:
data GAdv =
GPrepNP GPrep GNP -- in the house
| LexAdv String -- tomorrow
deriving Show
If you are working on the full RGL, the list of categories with 0-place functions can be quite large. I usually grep my whole computer for --haskell=lexical --lexical=
and copy whatever list I find. Here’s one I just found:
N,N2,N3,A,A2,A3,V,V2,V3,VA,V2V,VV,VS,V2A,V2S,V2Q,Adv,AdV,AdA,AdN,ACard,CAdv,Conj,Interj,PN,Prep,Pron,Quant,Det,Card,Predet,Subj
PGF
is a native Haskell library, and PGF2
is Haskell bindings to a C library. This section summarizes the main differences between them.
The library called PGF is:
gf-3.11
packagecabal install gf
.
stack install
to install GF from source, you will get the executable gf
globally on your computer, but not the library. So with stack, you still need to specify the gf
package in every project in both package.yaml and stack.yaml (because gf is not in Stackage).Pros:
Cons:
"Hello!"
into " hello ! "
The library called PGF2 is:
Pros:
BIND
/SOFT_BIND
and CAPIT
tokens in your grammar, you can parse "Hello!"
and not have to preprocess it into " hello ! "
Cons:
The PGF2 library in Hackage doesn’t come with the C library, so you need to install the C library yourself. (The Python bindings manage this with a Python wheel, so in Python there’s no need to manually compile any C libraries.)
The C library is installed in the location /usr/local/lib
, and that path may need to be specified in a few different places. I have put the following in the stack.yaml file in my repository:
extra-lib-dirs:
- /usr/local/lib
ghc-options:
"$locals": -optl=-Wl,-rpath,/usr/local/lib
Depending on your system, you might need 0-2 of these options in your stack.yaml.
The steps to do this are as follows:
$ git clone https://github.com/GrammaticalFramework/gf-core.git
$ cd gf-core/src/runtime/c
$ cat INSTALL
# … follow the instructions in that file for your system!
Here are the instructions if you want to look at them: github.com/GrammaticalFramework/gf-core/blob/master/src/runtime/c/INSTALL
I have prepared a Dockerfile which you can use to try out the example in my tutorial on GitHub. You can download the file anywhere on your computer, and follow the instructions below:
$ docker build -t pgf2 .
This will take a long time. Once it’s done, you should see output like following.
=> [10/10] RUN stack build --extra-include-dirs=/usr/local/lib --extra-lib-dirs=/usr/local/lib
=> exporting to image
=> => exporting layers
=> => writing image sha256:81712ef079ace0d888e94a56d64517c65156db625f10bbc873d9432eacab87f9
=> => naming to docker.io/library/pgf2
Now you can run the demo in the Docker image:
$ docker run -it pgf2 bash
root@8472aab978a6:/app/gf-embedded-grammars-tutorial/advanced-pgf2# stack run
To get the Haskell version of the abstract syntax to conform to PGF2, you need to use the flag --haskell=pgf2
. A full example would look like this:
gf -make \
-f haskell \
--haskell=pgf2 \ # As many --haskell=… flags as you want
--haskell=lexical --lexical=A,N,V,… \
YourGrammar.gf
You can read about the differences in inariksit/gf-embedded-grammars-tutorial/tree/master/advanced-pgf2. This is also the example that is used in the Dockerfile above. I won’t repeat the information here, so just read the README and inspect the code, which you can run in Docker or without.
The next flag that can be added to the usual command is --haskell=gadt
. An example of a full command looks like this:
gf -make \
-f haskell \
--haskell=gadt \
--haskell=lexical --lexical=A,Adv,N,V,… \
YourGrammar.gf
Adding the flag --haskell=gadt
creates a Haskell module where the full GF abstract syntax is represented under a single Haskell data type—a GADT. Earlier in this post, we saw how the GF types A
and N
were converted into two different Haskell types. Now everything is part of the same type:
data Tree :: * -> * where
LexA :: String -> Tree GA_
GPositA :: GA -> Tree GAP_
GPrepNP :: GPrep -> GNP -> Tree GAdv_
LexAdv :: String -> Tree GAdv_
…
GUseV :: GV -> Tree GVP_
GString :: String -> Tree GString_
GInt :: Int -> Tree GInt_
GFloat :: Double -> Tree GFloat_
The constructors of that data type can take any number of arguments, corresponding to the original GF function. As before, if the GF category C
is specified as an argument to the --lexical=
flag, then its constructor in the GADT is called LexC
and it takes a string. Otherwise, the constructors take other GF categories as arguments.1
This design gives some major benefits in doing tree transformations. If you read the previous post, you see these layers of wrappers before we get to modify the actual subtree that we want to modify;
-- first layer of wrapper
toReflexive :: GUtt -> GUtt
toReflexive (GUttS s) = GUttS (toReflexiveS s)
-- … other cases leave the Utt intact
-- second layer of wrapper
toReflexiveS :: GS -> GS
toReflexiveS (GUsePresCl pol cl) = GUsePresCl pol (toReflexiveCl cl)
-- …
-- The relevant transfer function is Cl -> Cl
toReflexiveCl :: GCl -> GCl
-- here happens the actual transformation
The further inside the start category we want to modify, the more awkward wrapper functions we need to write. Now contrast the previous function to the GADT version:
-- Transform a subtree, keep rest of the tree intact
toReflexive :: forall a . Tree a -> Tree a
toReflexive tree = case tree of
-- If argument tree matches, do the transformation
GPredVP subj (GComplV2 v2 obj) ->
if isSame subj obj
then GPredVP subj (GReflV2 v2)
else tree
-- If argument tree doesn't match, apply toReflexive to all subtrees
_ -> composOp toReflexive tree
Here we only needed to write a single function, with type signature Tree a -> Tree a
, and only write code for the actual case we want to transform. Then that same function is applied to all the subtrees with composOp
.
The function composOp
is included in the Haskell module generated with --haskell=gadt
. You can inspect the code here, but if that doesn’t say much to you, don’t worry. There are simple patterns that you can adopt in your GF grammar without understanding the internals.
composOp
is used for patterns where you want to modify an existing tree. So the pattern goes like this:
transformTree :: forall a . Tree a -> Tree a
transformTree tree = case tree of
-- If argument tree matches, transform it
SimpleSubtree -> AnotherSimpleSubtree
ComplexSubtree a1 a2 -> ComplexSubtree a1 ConstantArg2
Foo foo -> Bar Baz (computeResultFrom foo)
…
-- Otherwise, try to apply transformTree to all subtrees
_ -> composOp transformTree tree
And this works exactly because both the larger tree and its subtrees are of the same type: Tree a
.
Note that you do need to transform the tree into the same type: if SimpleSubtree
and AnotherSimpleSubtree
are of different types, you will get an error. But SimpleSubtree
, ComplexSubtree … …
and Foo …
can all be different types.
Another useful pattern is to extract something from a tree. This is the example from my tutorial:
-- If argument is a lexical function, return the String in a list
-- If argument doesn't match, apply getLex to all subtrees
getLex :: forall a . Tree a -> [String]
getLex tree = case tree of
LexA s -> [s]
LexDet s -> [s]
LexN s -> [s]
LexPN s -> [s]
LexPrep s -> [s]
LexPron s -> [s]
LexV s -> [s]
LexV2 s -> [s]
x -> composOpMonoid getLex x
Let’s take an example tree
-- the small cat sees a big dog
PredVP
( DetCN the_Det
( AdjCN ( PositA small_A ) ( UseN cat_N ) )
)
( ComplV2 see_V2
( DetCN a_Det
( AdjCN ( PositA big_A ) ( UseN dog_N ) )
)
)
Applying getLex
on that tree gives us a list of all the lexical functions as Strings:
[the_Det, small_A, cat_N, see_V2, a_Det, big_A, dog_N]
The return type doesn’t have to be a list, it can be any monoid. Haskell just has to know how to <>
together the two values, since the function is applied to the whole tree, and there are potentially multiple subtrees that match the extraction condition.
Bringert and Ranta (2008). A pattern for almost compositional functions This was the original paper that inspired the GADT and compos*
design. There is also composOpFold
and more, but I have never found use for them in my day-to-day GF tree transformation needs.
Blog post by some random person on the internet: Defeating return type polymorphism When you work with the GADT abstract syntax and you would like to have composOpMonoid
return a potentially heterogeneous list, you can take inspiration from this post and make your own newtype wrapper.
The types like GAdv
and GAdv_
are just dummy types generated automatically earlier in the Haskell module. You may use them as convenience in your own functions, but under the hood, everything is just a single datatype Tree a
. For instance, the type signature GV -> Tree GVP_
just desugars into Tree GV_ -> Tree GVP_
. ↩