Paraphrases and
Lexical Decomposition

posted by Darryl on 20 Dec 2014

In this post, I want to tackle an important issue when designing a lexicon for an app that uses Language Engine, namely: how do you cope with the variety of seemingly different ways that people can express the same or nearly-same idea? For example, consider the following interaction you could imagine taking place in an Instragram-like app:

User: Take a selfie!
App: Ok! *shutter* How's that?
User: Oh, it's awful!
App: What about with one of these filters?

This conversation, especially line 3 where the user says "it's awful", can be paraphrased many different ways. For instance, the user could have used a different adjective and said "it's horrible", or they could've combined a positive adjective with a negative by saying "it's not good". They could've alternatively used a verbal construction such as "I dislike it" or "I hate it", or a negated verbal construction, as in "I don't like it".

[[it's awful]] = exists e. awful(e) & experiencer(e,selfie)

[[it's horrible]] = exists e. horrible(e) & experiencer(e,selfie)

[[it's not good]] = exists e. good(e) & polarity(e,neg)
                            & experiencer(e,selfie)

[[I dislike it]] = exists e. dislike(e) & disliker(e,me)
                           & dislikee(e,selfie)

[[I hate it]] = exists e. hate(e) & hater(e,me) & hatee(e,selfie)

[[I don't like it]] = exists e. like(e) & polarity(e,neg) & liker(e,me)
                              & likee(e,selfie)

And yet these all basically express the same attitude towards the picture. A simple approach to meaning would end up giving us one meaning for each sentence, which would be pretty bad. Can we do better? Yes!

The simplest approach we can use to tackle this problem is called lexical decomposition. On the simple approach to meaning, we essentially assign one predicate to one verb, so that they correspond one to one. The adjective "good" says of a event that it's an event of being good, and it has an experiencer, the verb "like" says of an event that it's an event of liking and it has a liker and a likee, etc. Different event predicates, different argument relations But there's no a priori reason why we need to do this.

To move to a more abstract level, consider first that these are all ways of expressing an attitude that the speaker has towards the selfie. To say you don't like something, or to proclaim that it's awful, is to make express some judgment you've made. Moreover, the attitudes expressed in that way are in opposition to some other attitudes: it's not awful, it's good, I like it, etc.

The approach we can take, then, is as follows:

- Express this general class of meanings with a single predicate, such as qualJudgment
- Use polarity to express what sort of judgment is being make (positive or negative)
- Uniformly use an argument relation such as judgee for the object being judged
- For transitive verb, uniformly use an argument relation such as judger for the person judging
- Optionally, silently include the speaker as judger for the adjectival cases

The last, optional, part serves merely to further unify the different ways of expressing the attitudes.

If we take this approach, then the meaning of all of these paraphrases would be

exists e. qualJudgment(e) & polarity(e,neg) & judger(e,me)
        & judgee(e,selfie)

This decompositional approach can be taken quite far, in fact. For instance, verbs that involve an actor/agent/causer can be decomposed into predicates that act on distinct events. For instance, rather than assigning to "Vir killed the Emperor" the meaning

exists e. kill(e) & killer(e,vir) & killee(e,emperor)

we might instead take "kill" to really just mean "cause-to-die", and assign a different meaning:

exists e0 e1. cause(e0) & causer(e0,vir) & causee(e0,e1)
            & die(e1) & dier(e1,emperor)

Under such a meaning assignment, the relationship between "Vir killed the Emperor" and "The Emperor died" would be transparent.

A classic example of extreme lexical decomposition, going all the way back to the Generative Semantics literature of the mid 1960s, is a decomposition of the verb "break". Not content with simply decomposing once, the Generative Semanticists decomposed as much as they could, pulling apart the action, the causation, the change of state, and the brokenness. In the logical formalism we're using, the extreme decompositional meaning for "Delenn broke the glass" would be

exists e0 e1 e2 e3. act(e0) & actor(e0,delenn) & actee(e0,e1)
                  & cause(e1) & causer(e1,delenn) & causee(e1,e2)
                  & become(e2) & becomer(e2,e3)
                  & broken(e3) & experiencer(e3,glass)

And upon reflection, this really is sort of what the sentence means: Delenn acted in some way which constituted her causing it to come about that the the glass is broken.

This kind of extreme decomposition may not be necessary, and it's been criticized over the years (especially in the syntactic, rather than semantic, form that the Generative Semanticists favored), but it turns out it may actually be very useful in an AI context, because it provides structure to the events that must occur. By designating some of the events as acts, for instance, we can clearly distinguish between those events which the host app must perform as part of a command, from those that simply must have occurred (and which the host app therefore may not have to perform).

The amount of decomposition that's desirable isn't fixed ahead of time, and may not be the same in different problem domains, but it's an important tool for developers to know about.

If you have comments or questions, get it touch. I'm @psygnisfive on Twitter, augur on freenode (in #languagengine and #haskell). Here's the HN thread if you prefer that mode, and also the Reddit threads.