So you’re a grammar writer, and you’ve inherited an established
resource grammar. Say that someone decided to give all verb
subcategories the same lincat^{1}, so now your
intransitive verbs are cluttered with fields for direct, indirect,
sentential, adverbial and predicative complements and it just hurts
your sense of aesthetics.

You remember that GF lincats compile to multiple concrete categories, for each combination of parameters. Thus you wonder: which combinations are actually in use?

We could solve this problem in many ways (and honestly, in simpler ways than I’m going to show). But bear with me–this is a lesson in thinking in fixpoints, not a lesson in “does this make sense to model as a fixpoint computation”.

That said, let’s get started!

We can see GF categories as mutually recursive datatypes. This is actual GF abstract syntax:

```
cat
V, V2, VA, VV, VS, VQ, ..., VP, AP, NP, QS, S ;
fun
UseV : V -> VP ;
ComplV2 : V2 -> NP -> VP ;
ComplVA : VA -> AP -> VP ;
ComplVS : VS -> S -> VP ;
ComplVV : VV -> VP -> VP ;
ComplVQ : VQ -> QS -> VP ;
```

Let’s change the syntax of the functions slightly: put the result category to the left and remove arrows.

```
VP ::= useV V
VP ::= complV2 V2 NP
VP ::= complVA VA AP
VP ::= complVV VV VP
VP ::= complVS VS S
VP ::= complVQ VQ QS
```

If you squint a bit, it kind of looks like a system of equations.

```
x1 = … x3 … x17 …
x2 = … x1 …
x3 = … x3 … x5 … x23 …
…
x30 = … x24 … x27 …
```

Now, let’s move from the GF abstract syntax to the PMCFG concrete categories. I’ve labeled the different concrete categories with superscript numbers; we don’t need to care about their actual content just now. The example is simplified in two ways.

- We only show the functions
`UseV`

and`ComplVV`

. In reality, all others are in there too. - Apart from the first line, we show only one function for each concrete category. In reality, all of the result categories may have several functions that produce them.

```
VP¹ ::= useV¹ V² | complVV¹ VV⁹ VP⁵ | …
VP³ ::= complVV³ VV⁴ VP⁵
...
VP⁵ ::= useV⁵ V⁶
...
VP⁷ ::= complVV⁷ VV⁸ VP¹
```

Before, at the GF level, we could just look at the lexicon and check
whether there is a `V`

. But now we are not sure anymore whether we have
both `V²`

and `V⁶`

: maybe one of the concrete categories is
linguistically impossible.

The intuition here is: “Compute if a category is empty, given the emptiness of its argument categories.”

So we need some initial guesses for the values. Ultimately we want to
just know if a given category has a member or no, so we can deal with
Booleans. We give an initial value of `False`

to all variables: that
is, “all categories are empty”. Let’s mark that with red text.

```
VP¹ ::= useV¹ V² | complVV¹ VV⁹ VP⁵ | …
VP³ ::= complVV³ VV⁴ VP⁵
...
VP⁵ ::= useV⁵ V⁶
...
VP⁷ ::= complVV⁷ VV⁸ VP¹
```

In fact, now that we’re reduced to Booleans, we don’t even need to
care about what the original GF functions *do* with their arguments.
So we can replace all of the GF functions with just a `&&`

. In case of multiple functions that have the same concrete category as a result, we replace the `|`

with `||`

.

```
VP¹ ::= V² || (VV⁹ && VP⁵)
VP³ ::= VV⁴ && VP⁵
...
VP⁵ ::= V⁶
...
VP⁷ ::= VV⁸ && VP¹
```

How can a variable for a category turn into `True`

? We find a member
of that category! On the first round, we’ve got to deal with the
lexical categories: `V²`

and `V⁶`

for intransitive verbs, `VV⁴`

,
`VV⁸`

and `VV⁹`

for verbs with a `VP`

complement. Say that `V²`

and `VV⁸`

actually have values:

```
VP¹ ::= V² || (VV⁹ && VP⁵)
VP³ ::= VV⁴ && VP⁵
...
VP⁵ ::= V⁶
...
VP⁷ ::= VV⁸ && VP¹
```

Now that some variables have updated, we see if we can update other variables:

```
VP¹ ::= V² || (VV⁹ && VP⁵)
VP³ ::= VV⁴ && VP⁵
...
VP⁵ ::= V⁶
...
VP⁷ ::= VV⁸ && VP¹
```

Yes we can: when `V²`

was confirmed non-empty, next round `VP¹`

gets confirmed too.

Moving on–can we update yet more variables?

```
VP¹ ::= V² || (VV⁹ && VP⁵)
VP³ ::= VV⁴ && VP⁵
...
VP⁵ ::= V⁶
...
VP⁷ ::= VV⁸ && VP¹
```

Another change happened: that `VP¹`

just made `VP⁷`

non-empty.

But after this step, nothing is going to change anymore, no matter how much we recompute. In other words, we’ve found a fixed point!

So as I said, we’ve found *a* fixed point. There are several of
them–why is ours a good one?

Pretend for a moment that the set of variables `VP¹`

–`VV⁹`

has
nothing to do with the original GF grammar. Then, we could easily
conceive of other assignments to the variables that are fixpoints:
say, everything is False; only `VP⁵`

and `V⁶`

are True; everything is
True. (In contrast, “only `V⁶`

is True” is not a fixpoint, because if
we iterated one more time, `VP⁵`

would change from False to True).

Of course, given that these variables come from an actual GF grammar,
we don’t want to give an answer “everything is True” if some
categories actually correspond to abominations such as intransitive
verb with 5 obligatory arguments. We don’t want to be too pessimistic
either: why claim that a perfectly good category is empty? Instead,
we want the most conservative solution that respects reality–the
*least* fixpoint, given the actual GF grammar.

So how is *least* defined? We need to define an *ordering* on the
domain of possible solutions. In the case of Booleans, we just say
`False ≤ True`

, and hence a solution with more False is smaller.

This was quite a trivial problem for two reasons: the values are not changing after they get confirmed as True, and we had absolutely no use for the actual GF functions (or the PMCFG versions of them).

Let’s take another fairly trivial problem: **Is a category reachable
from the start category?**

- Start value: False (all categories are non-reachable from the start category)
- Confirm that the start category is reachable from the start category.
- For each function whose
*result category*is confirmed reachable, confirm its*argument categories*reachable.

We still don’t need the details of the function, just knowledge which arguments it takes. The values are still Booleans, and don’t change once they’ve become True.

Notice that the first problem was bottom-up: we started from several terminal categories, and moved up all the way to the start category. This time we start from the top, and confirm more and more lower-level categories reachable. But the iterative process is the same in both cases.

Another one! **Are some fields always empty?**

- Start (least) value: all fields in all categories are empty. Now the value
is not a Boolean, it is instead a set of fields that are non-empty–which
in the beginning is an empty set. The ordering is the
subset relation:
`A ≤ B`

if`A ⊆ B`

. - A field
`s`

in a terminal category`T`

is non-empty, if there is some actual word with a non-empty string in`s`

. When such a word is found, add that field to the initial set of non-empty fields. - A field
`s'`

in a non-terminal category`N`

is non-empty, if there is some function`f : T¹ -> … -> Tⁿ -> N`

, that uses at least one non-empty field from some`Tⁱ`

, or alternatively,*introduces new strings*into`s'`

.

Here we need the actual GF function, because we need to find out which
fields of its arguments it is using–it’s not enough to know that the
argument *has* some fields with some properties. In fact, since
functions may not only use their arguments but also introduce new
strings, we definitely need all three pieces of information: the emptiness
status of the arguments’ fields, which fields are used by `f`

, and
whether `f`

introduces new strings.

In the next post, we model a harder problem in `gftest`

as a fixpoint
computation: namely, context generation. Later we might also talk
about ways to make the actual computation smarter.

1: “I skipped so much redundant code due to subtyping–look at these ultra-generic functions!”, said the original grammarian when confronted about this decision.

tags: