Inari Listenmaa


CV · Blog · GitHub

4 July 2018

Use case of gftest: Fixing (some of) the Dutch grammar

Did you ever wonder just how exciting the life of a GF grammarian is? Ever wished that someone wore a helmet camera all day while creating new inflection tables and wondering about the scope of reflexive pronouns? Then this post is just for you!

Experimental setup

We describe a collaboration between the Grammarian, who is an expert in GF, and the Tester, a native Dutch speaker. We generated a set of test sentences for each function in Dutch, with English translations, and gave them to the native tester to read. The tester replied with a list of sentences that were wrong, along with suggestions for improvement. Communication between the grammarian and the tester was conducted via email.

Types of bugs

We can classify the bugs in two dimensions: how easy it is to understand what the problem is, and how easy it is to fix the grammar. Ease of understanding is relative to the grammarian: a trained linguist who is fluent in Dutch would have easy time pinpointing the error from the generated test cases, having both intuition and technical names for things. Ease of fixing is relative to the grammar: a given grammatical phenomenon can be implemented in a variety of ways, some of which are harder to understand. Say that relative clauses are implemented as terrible spaghetti code in a Dutch grammar, but very elegantly in a German grammar, and both have a bug that results in a similar ungrammatical sentences. In such a case, the problem would be equally easy to understand in both languages, but fixing the bug would be easier in the German grammar.

In more concrete terms, easy to fix means just some local changes in a single function. In contrast, bugs that are hard to fix usually involve modifying several functions, restructuring the code or adding new parameters.

Easy to understand, easy to fix

Perhaps the easiest bug to fix is to correct a wrong lexical choice. Below is an example feedback from the tester.

“opschakelen” is not the right translation of “switch on”. “aanzetten” or “aandoen” is better.

Other examples include wrong inflection or agreement, e.g. the polite second person pronoun should take the third person singular verb form, but was mistakenly taking second person forms.

Typically, bugs that are due to an almost complete implementation are easy to fix”. For instance, particle verbs were missing the particle in future tense. Looking at the generated sentences, we could see the particle being in the right place in all other tenses, except for the future. There was a single function that constructed all the tenses, and looking at the source code, we could see the line ++ verb.particle in all other tenses except the future. In such a case, fixing the bug is fairly trivial.

Easy to understand, hard to fix

Dutch negation uses two strategies: the clausal negation particle niet ‘not’, and the noun phrase negation geen ‘no’. There are some subtleties in their usage–the following quote comes from the tester:

In any case, one can never say “eet niet wormen” (don’t eat worms, literally).
That should always be “eet geen wormen” (don’t eat worms, correctly translated)

After that, we sent three more sentences as follow-up, and got the following answer:

eet niet deze wormen - maybe OK?, feels strange
eet deze wormen niet - definitely OK
eet niet 5 wormen - definitely OK

In the brain of a computational linguist, those two feedbacks translated into “clauses with indefinite noun phrases (worms) use noun phrase negation, but if the noun phrase is quantified (these worms, five worms), then clausal negation is okay”. This makes sense also semantically (if you’re the kind of person who reads this blog): the negation of “eat 5 worms” is not “eat no worms”, you can still eat 4 worms or 400.

In the grammar, this fix required changes to 13 categories. Not all categories had to be changed manually, but e.g. a change in NP changes all categories that depend on it, such as Comp and VP. Depending on how modularly the grammar is implemented, this means that some functions that operate on VP or Comp need to be changed too, when NP changes.

Hard to understand, easy to fix

The following two sentences were generated by the same function, which turns superlative adjectives and ordinal numbers into complements. The tester reported problems with both of them, as follows:

ik wil roodst worden –> ik wil het roodst worden (‘I want to become reddest’)
ik wil tiend worden –> ik wil tiende worden (‘I want to become tenth’)

We gave some more sentences to the tester, and got the following feedback:

ik wil linker worden –> ik wil de linker worden (‘I want to become left’)
ik wil 224e worden = OK (‘I want to become 224th’)

This small example gave at least three different ways of using these complements: for numerals, no article and -e at the end (tiende ‘tenth’); for superlative adjectives, the article het and no -e at the end of the adjective (het roodst ‘the reddest’), and for a class of adjectives like left and right [TODO: or is it only those?], the article de (de linker ‘the left one’).

In addition, the grammar has a separate construction for combining a numeral and a superlative adjective, e.g. “tenth best”. Since the tests were generated per function, the main tester didn’t read those sentences at the same time. After noticing the additional function, we asked another informant how to say Nth best, and got an alternative construction op (N-1) na best. Eventually, we got an answer that the strategy used for superlative adjectives, i.e. with the article het and no -e in the number, is acceptable.

Once it was clear to the grammarian how to proceed, fixing the bug was easy. There was already a parameter for the adjective form: attributive in two forms (strong and weak) and one predicative, and the different classes of adjectives corresponded to the abstract syntax of the GF RGL. Thus it was easy to modify the predicative form in a different way for different adjective types. Earlier, the predicative was just identical to the other attributive form, but now the AP type actually contains 3 different strings for superlatives: beste and best for attributive and het best for predicative. Adjectives in positive or comparative don’t get the article: good is just goede, goed and goed (not ✱het goed).

If there hadn’t been already a parameter for different adjective forms, or if the classes of words with different behaviours hadn’t corresponded to the RGL categories, then this bug would’ve required more work to fix.

Hard to understand, hard to fix

As an example of a problem that was hard to understand and hard to fix, we take the agreement of a reflexive construction in conjunction with a verbal complement. (Just the description sounds hard to understand!)

More concretely, consider the following sentences:

These seem like reasonable choices: if the object of liking was I in the second example, it wouldn’t be myself but me: “I help you like me”.

In the GF grammar, these sentences are constructed in a series of steps:

PredVP (UsePron i_Pron)
           (SlashV2V help_V2V
                 (SlashV2a like_V2)
           (UsePron they_Pron)

The innermost subtree is SlashV2a like_V2: the transitive verb like is converted into a VPSlash (i.e. VP\NP). Right after, the function ReflVP fills the NP slot and creates a VP. However, no concrete string for the object is yet chosen, because the reflexive object depends on the subject. The status of the VP is as follows at the stage ReflVP (SlashV2a like_V2):

    s = "like" ;
ncomp = table { I => "myself" ; You => "yourself" ;  } ;
vcomp = [] ;

If we added a subject at that point, the subject would choose the appropriate agreement: I like myself, you like yourself. But instead, we add another slash-making construction, SlashV2V help_V2V. Now the new verb help_V2V, which takes both a direct object and a verbal complement, becomes the main verb. The old verb like becomes a verbal complement.

    s = "help" ;
ncomp = table { I => "myself" ; You => "yourself";  } ;
vcomp = "like" ;

The next stage is to add an NP complement they_Pron, using the function ComplSlash. The standard way for ComplSlash is to insert its NP argument into the ncomp table, taking the vcomp field along.

In the old buggy version, ComplSlash just concatenated the new object and the vcomp with the reflexive that was already in the ncomp table. But the scope of the reflexive was wrong: when adding an object to a VPSlash that has a verbal complement clause, the object should complete the verbal complement and pick the agreement. It is not in the scope for the subject.

This was the old behaviour:

    s = "help" ;
ncomp = table { I => "them like myself" ; You => "them like yourself" ; …} ;
vcomp = [] } ;

And this is after fixing the bug:

    s = "help" ;
ncomp = table { _ => "them like themselves" } ;
vcomp = [] ;

But this turned out not to be a perfect solution. The exception to this is when the VPSlash is formed by VPSlashPrep : VP -> Prep -> VPSlash. With the changes to ComplSlash, we suddenly got sentences such as “[I like ourselves] without us”. This would be a valid linearisation for a tree where [ourselves without us] is a constituent (such a tree is formed by another set of functions and was linearised correctly), but in this case, the order of the constructors is as follows:

To fix this problem, we added another parameter to the category VPSlash. All VPSlashes constructed by VPSlashPrep have now a missingAdv set True: this tells that the VPSlash is not missing a core argument, so it shouldn’t affect the agreement. With the new parameter, ComplSlash can now distinguish when to choose the agreement from the NP argument and when to leave it open for the subject.

The same bug was found in languages, and we fixed it for Dutch, English and German, using the same strategy.


Excited, aren’t you! You thought they had already fixed ComplSlash but then came VPSlashPrep and revealed that all of this was part of its plan. While trying to help, they actually planted more bugs into the grammar. Will the herogrammarian make it in time before a critical application outputs a wrong translation and loses the customer a million SEK?

Anyway. After all these personal tales, you might want to know how many bugs were fixed and how many were of which kind. Let’s skip the whole ease of understanding, it’s subjective anyway, so here’s just how easy the bugs were to fix. I’ve probably forgot a bunch of bugs here.

Easy to fix:

  1. Several lexical changes.
  2. Several inflection fixes.
  3. youPol_Pron had agreement of Sg P2, changed it to Sg P3 so that a correct reflexive pronoun is chosen.
  4. Choose always stressed forms of personal pronouns.
  5. Change agreement in conjunctions
  6. Extra prefix in prefix verbs for perfect tense
  7. Missing participle in future tense
  8. Plural imperatives
  9. Two bugs in postmodifier APs: placement and the adjective form. “een getrouwde worm” is correct, but a heavier AP should become a postmodifier, and in that case, the adjective form should be without the e at the end.
  10. Superlatives and ordinals
  11. DetQuant (and DetQuantOrd) combining a Quant and a Num, and when Num is an actual digit, both Quant and the Num contribute with a string, thus becoming een 1 huis ‘a 1 house’.

Hard to fix

  1. Add missing inclected forms for past participles + add missing linearisation for the function PastPartAP
  2. Preposition contraction
  3. Negation patterns (niet and geen)
  4. Variety of word order weirdness in verbal complements: affected several functions, fixed in several functions
  5. Scope of ReflVP with VPSlash

Time spent by testers + how many sentences they read

Tester 1 has read probably hundreds or thousands of sentences by now. We wonder if he’s still sane. (TODO: get some more accurate numbers).

Tester 2 has been used as a backup when Tester 1 was not available and Grammarian wanted quick feedback. She’s read roughly tens of sentences.

The end

If you’ve read so far, maybe you appreciate the kind of sentences we torture our testers with. Not to be confused with useful life advice.

tags: gf, research