Playing around with natural language processing has given me the confidence to attempt some claim language modelling. This may be used as a claim drafting tool or to process patent publication data. Here is a short post describing the work in progress.

Background Reading:
- RDF (Resource Description Framework)
- Entity-attribute-value models
- Semantic Networks
- Part of Speech tagging using the Natural Language Processing Toolkit – http://www.nltk.org/book/ch05.html and http://textminingonline.com/dive-into-nltk-part-iii-part-of-speech-tagging-and-pos-tagger
- Chunking
- WIPO Patent Drafting Manual
Here, a caveat: this modelling will be imperfect. There will be claims that cannot be modelled. However, our aim is not a “perfect” model but a model whose utility outweighs its failings. For example, a model may be used to present suggestions to a human being. If useful output is provided 70% of the time, then this may prove beneficial to the user.
To start we will keep it simple. We will look at system or apparatus claims. As an example we can take Square’s payment dongle:
1. A decoding system, comprising:
Let’s say a claim consists of “entities”. These are roughly the subjects of claim clauses, i.e. the things in our claim. They may appear as noun phrases, where the head word of the phrase is modelled as the core “entity”. They may be thought of as “objects” from an object-oriented perspective, or “nodes” in a graph-based approach.
- “a decoding system”
- “a decoding engine”
- “a transaction engine”
An entity may have “properties” (i.e. “is” something) or may have other entities (i.e. “have” something).
In our example, the “decoding system” has the “decoding engine” and the “transaction engine” as child entities. Or put another way, the “decoding engine” and the “transaction engine” have the “decoding system” as a parent entity.
-
- “running on a mobile device”
- “in operation decoding signals produced from a read of a buyer’s financial transaction card”
- “in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state”
- “detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state”
- “identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits”
-
- “mobile device”
- “read”
- “buyer’s financial transaction card”
- “signals”
- “peaks”
- “bits”
[When modelling the part of speech tagger is mostly there but probably required human tweaking and confirmation.]
Mapping to Natural Language Processing
To extract noun phrases, we need the following processing pipeline:
claim_text > [1. Word Tokenisation] > list_of_words > [2. Part of Speech Tagging] > labelled_words > [3. Chunking] > tree_of_noun_phrases
Now, the NLTK toolkit provides default functions for 1) and 2). For 3) we have the options of a RegExParser, for which we need to supply noun phrase patterns, or Classifier-based chunkers. Both need a little extra work but there are tutorials on the Net.
Noun phrases should be used consistently throughout claim sentences – this can be used to resolve ambiguity.