Modelling Claim Language

Playing around with natural language processing has given me the confidence to attempt some claim language modelling. This may be used as a claim drafting tool or to process patent publication data. Here is a short post describing the work in progress.

Block font

Background Reading:

Here, a caveat: this modelling will be imperfect. There will be claims that cannot be modelled. However, our aim is not a “perfect” model but a model whose utility outweighs its failings. For example, a model may be used to present suggestions to a human being. If useful output is provided 70% of the time, then this may prove beneficial to the user.

To start we will keep it simple. We will look at system or apparatus claims. As an example we can take Square’s payment dongle:

1. A decoding system, comprising:

a decoding engine running on a mobile device, the decoding engine in operation decoding signals produced from a read of a buyer’s financial transaction card, the decoding engine in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state, detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state, identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits;
and
a transaction engine running on the mobile device and coupled to the decoding engine, the transaction engine in operation receiving as its input decoded buyer’s financial transaction card information from the decoding engine and serving as an intermediary between the buyer and a merchant, so that the buyer does not have to share his/her financial transaction card information with the merchant.

Let’s say a claim consists of “entities”. These are roughly the subjects of claim clauses, i.e. the things in our claim. They may appear as noun phrases, where the head word of the phrase is modelled as the core “entity”. They may be thought of as “objects” from an object-oriented perspective, or “nodes” in a graph-based approach.

In the above claim, we have core entities of:
  • “a decoding system”
  • “a decoding engine”
  • “a transaction engine”

An entity may have “properties” (i.e. “is” something) or may have other entities (i.e. “have” something).

In our example, the “decoding system” has the “decoding engine” and the “transaction engine” as child entities. Or put another way, the “decoding engine” and the “transaction engine” have the “decoding system” as a parent entity.

In the example, the properties of the entities are more complex. The “decoding system” does not have any. It just has the child entities. The “decoding engine” “is”:
    • running on a mobile device”
    • in operation decoding signals produced from a read of a buyer’s financial transaction card”
    • in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state”
    • “detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state”
    • “identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits”
 
In these “is” properties, we have a number of implicit entities. These are not in our claim but are referred to by the claim. They are basically the other nouns in our claim. They include:
    • “mobile device”
    • “read”
    • “buyer’s financial transaction card”
    • “signals”
    • “peaks”
    • “bits”

[When modelling the part of speech tagger is mostly there but probably required human tweaking and confirmation.]

Mapping to Natural Language Processing

To extract noun phrases, we need the following processing pipeline:

claim_text > [1. Word Tokenisation] > list_of_words > [2. Part of Speech Tagging] > labelled_words > [3. Chunking] > tree_of_noun_phrases

Now, the NLTK toolkit provides default functions for 1) and 2). For 3) we have the options of a RegExParser, for which we need to supply noun phrase patterns, or Classifier-based chunkers. Both need a little extra work but there are tutorials on the Net.

Noun phrases should be used consistently throughout claim sentences – this can be used to resolve ambiguity.

Automated Law: Simple Claim Breakdown Function

Patent attorneys: we care about the independent claims. An independent claim is a paragraph of text that defines an invention. Each invention has a number of discrete features. Can I build a function to spilt a claim into its component features?

The answer is possibly. Here is one way I could go about doing it.

First I would start with a JavaScript file: claimAnalysis.js. I would link this to an HTML page: claimAnalysis.html. This HTML page would have a large text box to copy and paste the text of an independent claim.

On a keyup() or onchange() event I would then run the following algorithm:

  • Get text as from text box as a string.
  • Set character placemarker as 0.
  • From placemarker, find character from set of character:s [“,”, “:”, “;”,”-” or new line].
  • Store characters from 0 to found character index as string in array.
  • Repeat last two steps until “.” or end of text.

From this we should have a rough breakdown of a claim into feature string arrays. It will not be perfect but it would make a good start.

We can then show each located string portion in the array to a user. For example, with JavaScript we can add a table within a form containing input text boxes in rows. Each text box can contain a string portion. We can also add a checkbox to each portion or table row.

The user can then be offered “spilt” or “join” option buttons.

  • “Split” requires only one selection.
  • The user is told to place the cursor/select text in the box where they want the split to occur (using selectionStart property?).
  • Two features are then created based on the cursor position or selected text.
  • “Join” requires > 1 features to be selected via the checkboxes.
  • All selected features are combined into one string portion in one text box which replaces the previous text boxes (possibly by redrawing the table).

Once any splitting or joining is complete the user can confirm the features. A confirm button could use the POST method to input the features to a PHP script that saves them as XML on the server.

<claim><number>1</number><feature id="1">A method for doing something comprising:</feature>...</claim>