Can Long-Context Large Language Models Do Your Job?

In this post we test the abilities of long-context large language models for performing patent analysis. How do they compare with a patent partner charging £400-600 an hour?

Or have I cannibalised my job yet?

Or do we still need Retrieval Augmented Generation?

  1. What is a Long-Context Large Language Model?
    1. Large Language Models (LLMs)
    2. Long Context
  2. What Patent Task Shall We Test?
  3. Top Models – March 2024 Edition
  4. Can I use a Local Model?
  5. How much?
    1. GPT4 Turbo
    2. Claude 3
  6. First Run
    1. Prompts
    2. Getting the Text
    3. Simple Client Wrappers
    4. Results
      1. D1 – GPT4-Turbo
      2. D1 – Claude 3 Opus
      3. D1 – First Round Winner?
      4. D2 – GPT4-Turbo
      5. D2 – Claude 3 Opus
      6. D2 – First Round Winner?
  7. Repeatability
  8. Working on the Prompt
  9. Does Temperature Make a Difference?
    1. GPT4-Turbo and D1
      1. Temperature = 0.7
      2. Temperature = 0.1
    2. Claude 3 and D1
      1. Temperature = 0.7
      2. Temperature = 0.1
    3. GPT4-Turbo and D2
      1. Temperature = 0.7
      2. Temperature = 0.1
    4. Claude 3 and D2
      1. Temperature = 0.7
      2. Temperature = 0.1
  10. Failure Cases
    1. Missing or Modified Claim Features
    2. Making Up Claim Features
    3. Confusing Claim Features
  11. Conclusions and Observations
    1. How do the models compare with a patent partner charging £400-600 an hour?
    2. Have I cannibalised my job yet?
    3. Do we still need Retrieval Augmented Generation?
    4. What might be behind the variability?
    5. Model Comparison
  12. Further Work
    1. Vision
    2. Agent Personalities
    3. Whole File Wrapper Analysis
    4. “Harder” Technology

What is a Long-Context Large Language Model?

Large Language Models (LLMs)

Large Language Models (LLMs) are neural network architectures. They are normally based on a Transformer architecture that applies self-attention over a number of layers (~11?). The more capable models have billions, if not trillions, of parameters (mostly weights in the neural networks). The most efficient way to access these models is through a web Application Programming Interface (API).

Long Context

LLMs have what is called a “context window”. This is a number of tokens that can be ingested by the LLM in order to produce an output. Tokens are roughly mapped to words (the Byte-Pair Encoding – BPE – tokeniser that is preferred by most models is described here – tokens are often beginnings of words, word bodies, and word endings).

Early LLMs had a context of ~512 tokens. This quickly grew to between 2000 and 4000 tokens for commercially available models in 2023. Context is restricted because the Transformer architecture performs its matrix computations over the context; the size of the context thus fixes the size of certain matrix computations – the longer the context, the more parameters and the larger the matrices involved.

In late 2023/early 2024, a number of models with long context emerged. The context window for GPT3.5 quickly extended to 8k, then 16k, then 32k. This was then followed later in 2023 by a longer 32k context for the more capable GPT4 model, before a 128k context window was launched in November 2023 for the GPT4-Turbo model.

(Note: I’ve often found a lag between the “release” of models and their accessibility to Joe Public via the API – often a month or so.)

In January 2024, we saw research papers documenting input contexts of up to a million tokens. These appear to implement an approach called ring attention, that was described in a paper in October 2023. Anthropic AI released a model called Opus in March 2024 that appeared comparable to GPT4 and had a stable long context of 200k tokens.

We thus seem to be entering a “long context” era, where whole documents (or sets of documents) can be ingested.

What Patent Task Shall We Test?

Let’s have a look at a staple of patent prosecution: novelty with respect to the prior art.

Let’s start reasonably easy with a mechanical style invention. I’ve randomly picked WO2015/044644 A1 from the bucket of patent publications. It’s a Dyson application to a hair dryer (my tween/teenage girls are into hair these days). The prior art citations are pretty short.

  1. A hair care appliance comprising a body having an outer wall, a duct extending
    at least partially along the body within the outer wall, an interior passage
    extending about the duct for receiving a primary fluid flow, a primary fluid
    outlet for emitting the primary fluid flow from the body, wherein the primary
    fluid outlet is defined by the duct and an inner wall of the body, wherein at least
    one spacer is provided between the inner wall and the duct.
Claim 1

In the International phase we have three citations:

D1 and D2 are used to support a lack of novelty, so we’ll look at them.

Note: we will not be looking at whether the original claim is or is not novel from a legal perspective. I have purposely not looked into anything in detail, nor applied a legal analysis. Rather we are looking at how the language models compare with a European Examiner or Patent Attorney. The European Examiner may also be incorrect in their mapping. As we know, LLMs can also “hallucinate” (read: confabulate!).

Top Models – March 2024 Edition

There are two:

  • GPT4-turbo; and
  • Claude 3 Opus.

These are the “top” models from each of OpenAI and Anthropic. I have a fair bit of experience with GPT3.5-Turbo, and I’ve found anything less than the “top” model is not suitable for legal applications. It’s just too rubbish.

For the last year (since April 2024), GPT4 has been the king/queen, regularly coming 10-20% above other models in evaluations. Nothing has been close to beating it.

GPT4-turbo performs slightly worse that GPT4, but it’s the only model with a 128k token context. It is cheaper and quicker than GPT4. I’ve found it good at producing structured outputs (e.g., nice markdown headings etc.) and at following orders.

Claude 3 Opus has a 200k token context and is the new kid on the block. The Opus model is allegedly (from the metrics) at the level of GPT4.

It’s worth noting we are looking at the relatively bleeding edge of progress here.

  • GPT4-turbo was only released on 6 November 2023. On release it had certain issues that were only resolved with the 25 January 2024 update. We will use the 25 January 2024 version of the model. I’ve noticed this January model is better than the initially released model.

Can I use a Local Model?

Short answer: no.

Longer answer: not yet.

There are a couple of 1 million token models available. See here if you are interested. I tried to run one locally.

It needed 8.8TB of RAM. (My beefy laptop has 64GB RAM and 8GB VRAM – only short 8724GB.)

Progress though is super quick in the amateur LLM hacking sphere (it’s only big matrix multiplication in an implementation). So we might have an optimised large context model by the end of the year.

Also I’ve found the performance of the “best” open-source 7B parameter models (those that I can realistically run on my beefy computers) is still a long way away from GPT4, more GPT3.5-Turbo level, which I have found “not good enough” for any kind of legal analysis. Also, I’ve found open-source models to be more tricky to control to get appropriate output (e.g., doing what you ask, keeping to task etc.).

How much?

You have to pay for API access to GPT4-Turbo and Claude 3. It’s not a lot though, being counted in pence for each query. I’ve found it’s worth paying £5-10 a month to do some experiments on the top models.

Here are some costings based on the patent example above, that has two short prior art documents.

The claim is around 100 tokens. The prior art documents (D1 and D2) are around 3000 and 6000 tokens. Throw in a bundle of tokens for the input prompts and you have around 9200 tokens input for two prior art documents.

On the output side, a useful table comparing a claim with the prior art is around 1500 tokens.

GPT4 Turbo

GPT4-Turbo has a current pricing of $10/1M tokens on the input and $30/1M tokens on the output. So we have about 10 cents ($0.092) on the input and about 5 cents on the output ($0.045). Around 15 cents in total (~12p). Or around 1s (!!!) of chargeable patent partner time.

Claude 3

The pricing for Claude is similar but a little more expensive – $15/1M on the input and $75/1M on the output (reflecting the alleged more-GPT4 than GPT4-Turbo level).

So we have about 15 cents ($0.138) on the input and about 15 cents on the output ($0.1125). Around 30 cents in total (~24p). Or around 2s (!!!) of chargeable patent partner time.

These costs are peanuts compared to the amounts charged by attorneys and law firms. It opens up the possibility of statistical analysis, e.g. multiple iterations or passes through the same material.

First Run

For our experiments we will try to keep things as simple as possible. To observe behaviour “out-of-the-box”.

Prompts

For a system prompt I will use:

You are a patent law assistant.

You will help a patent attorney with patent prosecution.

Take an European Patent Law perspective (EP).

As our analysis prompt scaffold I will use:

Here is an independent patent claim for a patent application we are prosecuting:    
---
{}
---

Here is the text from a prior art document:
---
{}
---

Is the claim anticipated by the prior art document?
* Return your result with a markdown table with a feature mapping
* Cite paragraph numbers, sentence location, and/or page/line number to support your position
* Cite snippets of the text to demonstrate any mapping

The patent claim gets inserted in the first set of curly brackets and the prior art text gets inserted in the second set of curly brackets.

We will use the same prompts for both models. We will let the model choose the columns and arrangement of the table.

Getting the Text

To obtain the prior art text, you can use a PDF Reader to OCR the text then save as text files. I did this for both prior art publication PDFs as downloaded from EspaceNet.

  • You can also set up Tesseract via a Python library, but it needs system packages so can be fiddly and needs Linux (so I sometimes create a Docker container wrapper).
  • Python PDF readers are a little patchy in my experience. There are about four competing libraries with stuff folding and being forked all over the place. They can struggle on more complex PDFs. I think I use pyPDF. I say “I think” because you did have to use pyPDF2, a fork of pyPDF, but then they remerged the projects, so pyPDF (v4) is a developed version of pyPDF2. Simples, no?
  • You can also use EPO OPS to get the text data. But this is also a bit tricky to set up and parse.
  • It’s worth noting that the OCRed text is often very “noisy” – it’s not nicely formatted in any way, often has missing or misread characters, and the whitespace is all over the place. I’ve traditionally struggled with this prior to the LLM era.

The claim text I just copied and pasted from Google patents (correctness not guaranteed).

Simple Client Wrappers

Nothing fancy to get the results, just some short wrappers around the OpenAI and Anthropic Python clients:

def compare_claim_with_prior_art_open_ai(claim: str, prior_art: str, system_msg: str = SYSTEM_PROMPT, model: str = OPENAI_MODEL):
"""Get the chat based on a user message."""
completion = openai_client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_msg},
{"role": "user", "content": PROMPT_SCAFFOLD.format(claim, prior_art)}
],
temperature=0.3
)
return completion.choices[0].message.content

def compare_claim_with_prior_art_anthropic(claim: str, prior_art: str, system_msg: str = SYSTEM_PROMPT, model: str = ANTHROPIC_MODEL):
"""Get the chat based on a user message."""
message = anthropic_client.with_options(max_retries=5).messages.create(
model=model,
max_tokens=4000,
temperature=0.3,
system=SYSTEM_PROMPT,
messages=[
{"role": "user", "content": PROMPT_SCAFFOLD.format(claim, prior_art)}
]
)
return message.content[0].text

Results

(In the analysis below, click on the images if you need to make the text bigger. Tables in WordPress HTML don’t work as well.)

D1 – GPT4-Turbo

Here’s GPT4-Turbo first off the blocks with D1:

Let’s compare again with the EP Examiner:

Successes:

  • hair care appliance” – yes, gets this and cites the same objects as the EP Examiner (actually does a better job of referencing but hey ho).
  • spacer” – while GPT4-Turbo says this is not “explicitly mentioned”, it does cite the “struts 24”, which are the same features cited by the EP Examiner.

Differences:

  • outer wall” – deemed to be not explicitly present – doesn’t make the jump made by the EP Examiner to find this feature implicit in the structure of the “hair dryer 2”.
  • duct…within the outer wall” – GPT4-Turbo decides to cite an inner hot air passageway formed by the fan 3 and heater 4 – on a brief look this seems possibly valid in isolation. However, there is an argument that it’s the outer passageway 12 that better extends within the outer wall.
  • interior passage” – GPT4-Turbo can’t find this explicitly mentioned. Interestingly, the EP Examiner doesn’t cite anything directly to anticipate this feature, so we can maybe assume it is meant to be implicit?
  • primary fluid flow outlet” – GPT4-Turbo cites the “blower opening 7”, which is an fluid outlet.
  • primary fluid flow outlet defined by the duct and an inner wall of the body” – GPT4-Turbo says this is implicit saying it is defined by “inner structures”. It’s not the most convincing but looking at the picture in Figure 1, it could be argued. I do think the EP Examiner’s “cold air nozzle” is a bit of a better fit. But you could possible argue both?

We will discuss this in more detail in the next section, but for now let’s also look at Claude 3…

D1 – Claude 3 Opus

Now let’s see how new kid, Opus, performs:

Successes:

  • hair care appliance” and “outer wall” – yes, gets this and cites the same objects as the EP Examiner (actually does a better job of referencing but hey ho).
  • primary fluid outlet” – hedges its bets by referring to both the hot and cold air streams but slightly better matches the EP Examiners citation.

Differences:

  • duct…within the outer wall” – Claude 3’s a bit more bullish than GPT4-Turbo, announcing this is not disclosed. I’d warrant that there’s more evidence for it being disclosed than not disclosed so would side more with the EP Examiner than Claude.
  • interior passage” – Again, whereas GPT4-Turbo was a little more tentative, Claude 3 appears more confident in saying this is not disclosed. I don’t necessarily trust its confidence but as before the EP Examiner is silent on what explicitly anticipates this feature.
  • primary fluid flow outlet defined by the duct and an inner wall of the body” – Claude 3 here says it is not disclosed, but I don’t thing this is entirely right.
  • “spacer” – Claude 3 says this isn’t disclosed and doesn’t mention the “struts 24”.

D1 – First Round Winner?

I’d say GPT4-Turbo won that round for D1.

It didn’t entirely match the EP Examiner’s mapping, but was pretty close.

Both models were roughly aligned and there was overlap in cited features.

I’d still say the EP Examiner did a better job.

Let’s move onto D2.

D2 – GPT4-Turbo

Here’s what the EP Examiner said about D2:

Helpful. Here’s the search report:

Also helpful.

Here’s Figure 1:

And here’s the results:

Successes:

  • body having an outer wall” – yes, in isolation this can be argued.
  • duct” – does say this appears to be present but does indicate the word “duct” is not explicitly used (CTRL-F says: “correct”).
  • interior passage” – GPT4-Turbo cites the flow “through the casing to the grille”, where the casing is 12 in Figure 1 and the grill is 24 (using those would help GPT4-Turbo!). This I think can be argued in isolation.
  • primary fluid outlet” – lines 50 to 60 of column 2 do refer to a “blow opening” as quoted and the “primary fluid flow” does go from the grille to the “blow opening”. Good work here.

Differences / Failures:

  • A hair care appliance” has gone walkabout from the claim features.
  • …defined by the duct and an inner wall” – GPT4-Turbo says this is not explicitly disclosed but does take a guess that it is implicitly disclosed. I would like some more detailed reasoning about what features could stand in for the duct and inner wall. But I’d also say the GPT4-Turbo is not necessarily wrong. In the Figure 1, there is a “air flow passage 33” between the “back shell 20” and the “reflector-shield 28”, which could be mapped to a “duct” and an “inner wall”?
  • spacer” – GTP4-Turbo can’t find this. If you mapped the “air flow passage 33” to the “duct”, “spacers” may be implicit? A discussion on this and its merits would be useful. Checking D2, I see there is explicit disclosure of “spacer means” in line 55 of column 3. I’m surprised this is absent.

D2 – Claude 3 Opus

Successes:

  • hair care appliance” and “outer wall” – yes, although I think GPT4-Turbo’s “back shell 20” is better.
  • primary fluid outlet” – yes, I think element 36 and the front “grille” can be argued in isolation to be a “primary fluid outlet”

Differences / Failures:

  • duct” – Claude 3 does say this is present but the cited text isn’t amazingly useful, despite being from the document. It’s not clear what is meant to be the “duct”. However, it is true you could argue something within the back and front shell is a duct.
  • “interior passage” – similar to “duct” above. Claude 3 says it is present but the text passage provided, while from the document, doesn’t seem entirely relevant to the claim feature.
  • definition of “primary fluid outlet” – Claude’s 3 reasoning here seems appropriate if you have the molded “multiple purpose element 36” as the “primary fluid outlet” but there is maybe room to argue “periphery openings 42” help define the “element 36”? Definitely room for a discussion about whether this feature is present.
  • “spacer” – as per GPT4-Turbo, Claude 3 says this is not present despite there being “spacer means” in line 55 of column 3.

D2 – First Round Winner?

GPT4-Turbo and Claude 3 both do a little less well on the twice-as-long D2.

They do have the disadvantage of not being able to use the figures (*yet*).

Their lack of discussion of the “air flow passage 33” formed from “openings 42” is a little worrying. As is their ignorance of the “spacer means” in line 55 of column 3.

Patent attorney and EP Examiner win here.

Repeatability

As I was running some tests (coding is iterative, you fail, then correct, then fail, then correct until it works), I noticed that there was a fair bit of variation in the mapping tables I was getting back from both models. This is interesting as a human being would expect a mapping to be relatively stable – the claims features are either anticipated, or they are not.

Here’s GPT4-Turbo again on D1:

Here’s the previous run:

We can see the following issues:

  • In the first analysis GPT4-Turbo thought the “outer wall” was disclosed. In the second run, it said it was not explicitly mentioned.
  • Also note how we have slightly different “features” for each run, and differing columns and formats.
  • The mapping for the “duct” is also different, with differently levels of “confidence” on the presence and the possible implicit features.
  • On the first run, GPT4-Turbo though the “interior passage” was “not explicitly mentioned” but on the second run thought it was implied by structures and provided paragraph references.
  • Different features are mapped to the “primary fluid outlet”.
  • It locates the “struts 24” on both runs but on the first run thinks they are “functionally similar”, while on the second run finds them to “serve a different purpose”.

Uh oh. We have quite a different mapping each time we perform the run.

Let’s look at running Claude 3 again:

As compared to the previous run:

Claude 3 seems slightly more consistent between runs. We can see that the columns have shifted around, and I don’t necessarily agree with the mapping content, but the mapping detail seems mostly conserved.

Let’s look at another run for Claude 3 on D2:

Here Claude 3 does much better than the first run. The citation column appears more relevant. And party-time, it’s found and mentioned the “spacer means”. The “interior passage” mapping is better in my opinion, and is more reflectively of what I would cite in isolation on a brief run through.

Working on the Prompt

Maybe we can overcome some of these variability problems by working on the prompt.

It may be that the term “anticipated” is nudging the analysis in a certain, more US centric, direction. Let’s try explicitly referencing Article 54 EPC, which is more consistent with us setting a “European Patent Law perspective” in the system prompt.

Also let’s try shaping the mapping table, we can specify columns we want filled in.

Here’s then a revised prompt:

Here is an independent patent claim for a patent application we are prosecuting:
---
{}
---

Here is the text from a prior art document:
---
{}
---

Is the claim novel under Art.54 EPC when compared with the prior art document?
* Return your result with a markdown table with a feature mapping
* Cite paragraph numbers, sentence location, and/or page/line number to support your position
* Cite snippets of the text to demonstrate any mapping

Here is the start of the table:
| # | Feature Text | In prior art? Y/N | Where in prior art? | Any implicit disclosure? | Comments |
|---| --- | --- | --- | --- | --- |

Does that help?

In short – not really!

GPT4-Turbo seems to do a little worse with this new prompt. It appears more certain about the mapping – e.g. the “duct” is deemed not in the prior art (“N”), with no implicit disclosure and simply a statement that “The prior art does not explicitly describe a duct within the outer wall of the body”. This can be compared to the first run where this was deemed present “indirectly”.

GPT4-Turbo also introduces an error into the claim mapping, which we discuss later below.

Even though we specify more columns, the amount of text generated appears roughly the same. This means that for both models our reasoning is a bit shorter, and the models tend towards more fixed statements of presence or, more often, non-presence.

Also, although our “In prior art? Y/N” column provides a nice single letter output we can parse into a structured “True” or “False”, it does seem to nudge the models into a more binary conclusion. For example, the comments tend to confirm the presence conclusion without additional detail, whereas when the model was able to pick the columns, there was a longer, more useful discussion of potentially relevant features.

I had hoped that the “Any implicit disclosure” column would be a (sub) prompt for considering implicit disclosures a bit more creatively. This doesn’t seem to be the case for both models. Only Claude 3 uses it once in the D2 mapping (although it does use it there in the way I was hoping). I think we will ditch that column for now.

This little experiment suggests that keeping any mapping table as simple as possible helps improve performance. It also shows that LLM-wrangling is often as much of an art as a science.

Does Temperature Make a Difference?

Temperature is a hyperparameter that scales the logits output by the model prior to sampling the probabilities. This is a nice explanation. Or in English, it controls how “deterministic” or “random” the model output is. Values of around 0.1 / 0.2 should have pretty consistent output without much variation, values around and above 1 will be a lot more “creative”.

I general use a temperature of somewhere between 0.3 and 0.7. I have found that higher temperatures (around 0.7) are sometimes better for logical analysis where a bit of “thinking outside the obvious” is required.

Let’s go back to a three column table with “Claim Feature”, “Prior Art Mapping”, and “Cited Portions”. Let’s then run a round with a temperature of 0.7 and a temperature of 0.1. We will at least keep the prompt the same in both cases.

From the experiments above, it may be difficult to determine the effect of temperature over and above variability inherent in the generating of responses, but let’s have a look anyway.

(Those with a proper science degree look away.)

GPT4-Turbo and D1

Temperature = 0.7

Temperature = 0.1

There doesn’t actually seem to be that much difference between the mappings here, apart from that underlying variability discussed before.

It may be that with the temperature = 0.7 run, the model is freer to diverge from a binary “yes/no” mapping.

In the temperature = 0.1 run, GPT4-Turbo has actually done pretty well, matching the EP Examiner’s conclusions on all features apart from the last feature (but at least indicating what could be mapped).

Claude 3 and D1

Temperature = 0.7

Temperature = 0.1

Here we can again see that Claude 3 seems more consistent between runs. While there are some small differences, the two runs are very similar, with often word-for-word matches.

Claude 3 does well here, pretty much matching the EP Examiner’s objection in both cases.

GPT4-Turbo and D2

Temperature = 0.7

Temperature = 0.1

Here we can see the variation of the GPT4-Turbo. With one mapping, all the features are found in the prior art; with the other mapping, nearly all the features are not found in the prior art. Which to believe?!

Claude 3 and D2

Temperature = 0.7

Temperature = 0.1

Again Claude 3 seems much more consistent across runs. But I’m not that impressed with the reasoning – e.g. compare these to the “good” GPT4-Turbo run above.

So in conclusion, temperature doesn’t seem to make a load of difference here. It is not a silver bullet, transforming “bad” mappings into “good”. The issues with performance and consistency appear to be model, rather than hyperparameter based.

Failure Cases

Missing or Modified Claim Features

With GPT4-Turbo there was a case where the claim features did not entirely sync up with the supplied claim text:

Here “[a] hair care appliance comprising” has gone walk-about.

Also GPT4-Turbo seems to paraphrase some of the claim features in the table.

The feature “an interior passage extending about the duct for receiving a primary fluid flow” becomes “interior passage for receiving a primary fluid flow“. Such paraphrasing by a trainee would give senior patent attorneys the heebie-jeebies.

Making Up Claim Features

This is an interesting failure case from GPT4-Turbo. It appears to get carried away and adds three extra claim features to the table. The claim doesn’t mention infra-red radiation anywhere…

As with the case below, it seems to be getting confused with the “claim 1” we are comparing and the “claim 1” of the prior art. It is interesting to note this occurs for the longer prior art document. It is a nice example of long document “drift”. I note how RAG offers a solution to this below.

I found similar behaviour from GPT3.5-turbo. GPT3.5-turbo was a lot worse, it often just made up the claim features, or decided to take them from the comparison text instead of the claim. Similar to if you gave the exercise to a 6 year-old.

Confusing Claim Features

Here Claude 3 does what at first site looks like a good job. Until, you realise the LLM is mapping what appears to be a claim from the prior art document, onto the prior art document.

This may be an issue I thought we’d might see in long context models. In the prompt we put the claim we are comparing first. But then we have 6000 tokens from D2. It looks like this might cause the model to “forget” the specific text of the claim but “remember” that we are mapping some kind of “claim” and so pick the nearest “claim” – claim 1 of D2.

Looking at the claim 1 of D2 this does appear to be the case:

In a hand held hair dryer having means for directing air flow toward hair to be dried, the improvement comprising, in combination:
a casing having a forward grille-like support member adapted to be faced toward the hair to be dried;
an infra-red, ring-shaped, radiator in said casing spaced rearwardly of the grille-like member;
a motor carried on the grille-like member and extending rearwardly thereof centrally of said ring shaped radiator;
shield means between the ring-shaped radiator and the motor for protecting the motor from the infrared radiation; radiation reflector means, including a portion spaced rearwardly of the ring-shaped radiator for directing reflected radiation toward and through the grille-like member;
a flat air propeller operatively associated with and driven by the motor and located spaced axially formed of the rearward portion of the reflector and rearward of the ring-shaped radiator, the propeller being operative to direct only a gentle flow of air through said grille toward the hair to be dried, to avoid destruction and disarray of the hairdo, but to move the top layers of hair sufficiently to permit radiation drying of the hair mass; and
means for introducing cooling air into the casing to cool portions of the casing and the motor.

Claim 1 of D2

It’s interesting to note that this was also a problem we found with GPT3.5-Turbo.

Conclusions and Observations

What have we found out?

  • Results are at a possibly-wrong, average-ability, science-graduate level.
  • Prompt crafting is an art – you can only learn by doing.
  • Temperature doesn’t matter that much.
  • Variability is a problem with GPT4-Turbo.
  • LLMs can get confused on longer material.

At the start we had three questions:

  1. How do the models compare with a patent partner charging £400-600 an hour?
  2. Have I cannibalised my job yet?
  3. Do we still need Retrieval Augmented Generation?

Let’s see if we can partially answer them.

How do the models compare with a patent partner charging £400-600 an hour?

Ignoring cost, we are not there yet. Patent attorneys can sigh in relief for maybe another year.

But considering the models cost the same as 1-2s of patent partner time, they didn’t do too bad at all.

One big problem is consistency.

GPT4-Turbo has some runs and some feature mappings that I am fairly happy with. The problem is I perform a further run with the same prompt and the same parameters and I get a quite different mapping. It is thus difficult to “trust” the results.

Another big problem is apparent confidence.

Both models frequently made quite confident statement on feature disclosure. “This feature is not disclosed”. However, on the next mapping run, or by tweaking the prompt, the feature was found to be disclosed. So the confident statements are more features of the output. The models don’t seem to do accurate shades of confidence out-of-the-box.

If you are a skeptical person like myself, you might not believe what you are told by human or machine (watch the slide into cynicism though). In which case, you’d want to see and review the evidence for any statement yourself before agreeing. If you treat LLMs in this manner, like a brand new graduate trainee, sometimes helpful, sometimes off, then you are well placed.

If you are a nice trusting human being that thinks that both human beings and machines are right in what they say, you will come to harm using LLMs. LLMs are particularly slippery because they provide the most likely, not the most factually correct, output. While the two are correlated, correlation does not necessarily equal truth (see: science).

Often, a clear binary mapping (“Yes – the feature is disclosed”) leads the model to later justify that sampling (“The feature is disclosed because the feature is disclosed”) rather than provide useful analysis. We had better performance when we were less explicit in requiring a binary mapping. However, this then leads to problems in parsing the results – is the feature disclosed or not?

Have I cannibalised my job yet?

Not quite.

But if I needed to quickly brainstorm mappings for knocking out a claim (e.g., in an opposition), I might run several iterations of this method and look at the results.

Or if I was drafting a claim, I could “stress test” novelty against known prior art by iterating (e.g. 10-20 times?) and looking at the probabilities of feature mappings.

If neither model can map a feature, then I would be more confident in the robustness in examination. These would be the features it is worth providing inventive step arguments for in the specification. But I would want to do a human review of everything as well.

While I do often disagree with many of the mappings, they tend not to be completely “wrong”. Rather they are often just poor argued or evidenced, miss something I would pick up on, or the mapping is inconsistent across the whole claim. So at the level of “quick and dirty opposition”, or “frustrating examiner getting the case off their desk”.

If models do map a feature, even if I don’t agree with the mappings, they give me insight into possible arguments against my claim features. This might enable me to tweak the claim language to break these mappings.

Do we still need Retrieval Augmented Generation?

Surprisingly, I would say “yes”.

The issues with the claim feature extraction and the poorer performance on the longer document, indicate that prompt length does make a difference even for long-context models. Sometimes the model just gets distracted or goes off on one. Quite human like.

Also I wasn’t amazingly impressed with the prior art citations. The variability in passages cited, the irrelevance of some passages, and the lack of citation of some obvious features reduced my confidence that the models were actually finding the “best”, most representative disclosure. The “black box” nature of a large single prompt makes it difficult to work out why a model has indicated a particular mapping.

RAG, in the most basic form as some kind of vector comparison, provides improved control and explainability. You can see that the embeddings indicate “similarity” (whether this is true “semantic similarity” is an open question – but all the examples I have run show there is some form of “common sense” relevance in the rankings). So you can understand that one passage is cited because it has a high similarity. I find this helps reduce the variability and gives me more confidence in the results.

You can also get better focus from RAG approaches. If you can identify a subset of relevant passages first, it then becomes easier to ask the models to map the contents of those passages. The models are less likely to get distracted. This though comes at the cost of holistic consistency.

RAG would also allow you to use GPT4 rather than GPT4-Turbo, by reducing the context length. GPT4 is still a little better in my experience.

What might be behind the variability?

The variability in the mappings, and the features that are mapped, even in this relatively simple mechanic case, might hint at a deeper truth about patent work: maybe there is no “right” answer.

Don’t tell the engineers and scientists, but maybe law is a social technology, where what matters is: does someone else (e.g., a figure in authority) believe your arguments?

Of course, you need something that cannot be easily argued to be “incorrect”. But LLMs seem to be good enough that they don’t suggest wildly wrong or incorrect mappings. At worst, they believe something is not there and assert that confidently, whereas a human might say, “I’m not sure”.

But.

There may just be an inherent ambiguity in mapping a description of one thing to another thing. Especially, if the words are different, the product is different, the person writing it is different, the time is different, the breakfast is different. There might be several different ways of mapping something, with different correspondences having differing strengths and weaknesses, if differing areas. Why else would you need to pay clever people to argue for you?

I have seen this sometimes in trainees. If you come from a position of having completed a lot of past papers for the patent exams, but worked on few real-world cases, you are more likely to think there is a clearly “right” answer. The feature *is* disclosed, or the feature is *not* disclosed. Binary fact. Bosh.

However, do lots of real-world cases and you often think the exams are trying to trick you. “What, there is a clearly defined feature that is clearly different?” 80-90% of cases often have at least one feature that is borderline disclosed – it is there if you interpret all these things this way, but it isn’t there if you take this interpretation. Real-life is more like the UK P6 exam. You need to pick a side and commit to it, but have emergency plans B-H if plan A fails. Most of the time for a Rule 161 EPC communication, you recommend just arguing your side on the interpretation. The Examiner 90% of the time won’t budge, but that doesn’t say that what you say is wrong, or that a court or another jurisdiction will always agree with the Examiner.

This offers up the interesting possibility that LLMs might be better at patent exams than the exercise above…

Model Comparison

I was impressed at Claude 3 Opus. While I think GPT4-Turbo still has the edge, especially at half the price, Claude 3 Opus gave it a run for it’s money. There wasn’t a big difference in quality.

Claude 3 Opus also had some properties that stood out over GPT4-Turbo:

  • It seemed more reliable on repeated runs. There was less variability between runs.
  • It has nearly double the token context length. You could stick in all the prior art documents cited on a case.

Interestingly both Claude 3 and GPT4-Turbo tended to fall down in similar ways. They would both miss pertinent features, or sometimes get distracted in long prompts.

Based on these experiments, I’d definitely look at setting up my systems to modularly use LLMs, so I could evaluate both GPT4-Turbo and Claude 3.

Setting up billing and API access for Anthropic was also super easy, OpenAI-level. I have also tried to access Azure and Google models. They are horrendously and needlessly complicated. Life is too short.

Further Work

Vision

I didn’t look at the vision capabilities in this test. But both GPT4-Turbo and Claude 3 Opus offer vision capabilities (using a Vision Transformer to tokenise the image). One issue is that GPT4-Turbo doesn’t offer vision with long context – it’s still limited to a small context prompt (or it was last time I looked at the vision API). The vision API also has strong “alpha” vibes that I’d like to settle down.

But because you are all cool, here’s a sneak peak of GPT4-Turbo working just with the claim 1 text and Figure 1:

Claim Feature Figure 1 (D1) Reference Numeral in D1 Match (Yes/No) Notes
Hair care appliance Hair care appliance (likely a hair dryer) Yes The figure depicts a hair care appliance.
Body having an outer wall Visible outer wall 1 Yes The body of the appliance with an outer wall is clearly shown.
Duct extending within the outer wall Duct present 4 Yes There is a duct extending along the body within the outer wall.
Interior passage for receiving fluid flow Space around the duct Yes There appears to be an interior passage for airflow around the duct.
Primary fluid outlet Outlet for emitting fluid flow 7, 9, 13 Yes The end of the appliance acts as the fluid outlet.
Outlet defined by the duct and an inner wall Defined by duct and inner wall 13, 14 Yes The primary fluid outlet seems to be defined by the duct and the inner wall.
At least one spacer between the inner wall and the duct Presence of spacer(s) ? No It is unclear if spacers are present as they are not clearly depicted or labeled.

Pretty good!

A very similar analysis to the text, just from the image.

It’s definitely worth looking at integrating vision and text models. But how to do so is not obvious, especially how to efficient combine vision and long context input (there are some engineering challenges to getting the figures from a PDF involving finding the TIFFs or chopping pages into JPEGs that are boring and fiddly but essential).

Agent Personalities

We used fairly simple prompts in our example.

But we also commented on how often the law was a social language game.

Does your analysis of a claim differ if you are an examiner versus an attorney? Or if you are a judge versus an inventor? Or a patent manager versus a CEO?

It’s an open question. My first thought is: “yes, of course it does”. Which suggests that there may be mileage in performing our analysis from different perspectives and then integrating the results. With LLMs this is often as easy as stating in the user or system prompt – “YOU ARE A PATENT EXAMINER” – this nudges the context in a particular direction. It would be interesting to see whether that makes a material difference to the mapping output.

Whole File Wrapper Analysis

In our analysis with two prior art documents, we had 10,000 tokens. These were short prior art documents and we saw there was some degradation with the longer document. But we are still only 5-10% of the available prompt context.

It is technically possible to stick in all the citations from the search report (Xs, Ys, As) and go “ANALYSE!”. Whether you’d get anything useful or trustworthy is still an open question based on the present experiments. You could also get the text from the EPO prosecution ZIP or from the US File Wrapper.

I’d imagine this is where the commercial providers will go first as it’s the easiest to implement. The work is mainly in the infrastructure of getting the PDFs, extracting the text from the PDFs, then feeding into a prompt. A team of developers at a Document Management company could build this in a week or so (I can do it in that timespan and I’m a self-taught coder). It would cost though – on my calculations around £10-15 on the API per query, so 10x+ that on charges to customers. If your query is rubbish (which is often is for the first 10 or so attempts), you’ve spent £10-15 on nothing. This is less of a no-brainer than 15p.

Looking at the results here, and from reading accounts on the web, I’d say there is a large risk of confusion in a whole file wrapper prompt, or “all the prior arts”. What happens when you have a claim 1 at the start, then 10 other claim 1s?

Most long-context models are tested using a rather hacky “needle in a haystack” metric. This involves inserting some text (often incongruous, inserted at random; machine learning engineers and proper scientists or linguistics weep now) and seeing whether the query spots it and reports accordingly. GPT4-Turbo and Claude 3 Opus seem to pass this test. But finding something is an easier task than reasoning over large text portions (it just involves configuring the attention to find it over the whole input space, which is easy-ish; “reasoning” requires attention computations over multiple separated portions).

So I predict you’ll see a lot of expensive “solutions” from those that already manage data but these may be ineffective unless you are clever. They would maybe work for simple questions, like “where is a spacer between a duct and an inner wall possibly described?” but it would be difficult to trust the output without checking or know what exactly the black box was doing. I still feel RAG offers the better solution from an explanability perspective. Maybe there is a way to lever the strengths of both?

“Harder” Technology

Actually my experience is that there is not a big drop off with perceived human difficulty of subject matter.

My experiments for hardcore A/V coding, cryptography, gene editing all show a similar performance to the mechanical example above – not perfect, but also not completely wrong. This is surprising to us, because we are used to seeing a human degradation in performance. But it turns out words are words, train yourself to spin magic in them, and one area of words is just as easy as another area of words.

What is a patent? Asking Again in the Age of Machine Learning

Large Language Models (LLMs) and other Machine Learning (ML) approaches have made huge progress over the last 5-6 years. They are leading to existential questioning in professions that pride themselves on a mastery of language. This includes the field of patent law.

When new technologies arrive, they also allow us a different perspective. Let’s look.

  1. What is a patent?
    1. Claims
      1. Why claims?
    2. Description and Figures
  2. How do computers “see” claims?
    1. A Very Brief History of Patent Information
    2. From Physical to Digital
    3. What does this all mean for my claims?
  3. Comparing claims using computers
    1. Traditional Patent Searching
    2. How Patent Attorneys Compare Claims
      1. Construe the claim
      2. Split the Claim into Features
        1. A Short Aside on Segmentation
        2. Things or Events as Features
      3. Match Features
        1. What does it mean for a feature to match?
      4. Look at the number of matched features
    3. Can we automate?
      1. Fuzzy matching
      2. word2vec
      3. Transformers
  4. Settlers in a Brave New World

What is a patent?

At its heart, a patent is a description of a thing or a process.

It is made up primarily of two portions:

  • claims – these define the scope of legal protection.
  • description and figures – these provide the detailed background that supports and explains the features of the claims.

Claims

These are a set of numbered paragraphs. Each claim is a single sentence. A claimset is typically arranged in a hierarchy:

  • independent claims
    • These claims stand alone and do not refer to other claims.
    • They represent the broadest scope of protection.
    • They come in different types representing different types of protection. These relate to different infringing acts.
  • dependent claims
    • These claims refer to one or more other claims.
    • They ultimately depend on one of the other independent claims.
    • They offer additional limitations that act as fallback positions – if an independent claim is found to lack novelty or an inventive step, a combination of that same independent claim and one or more dependent claims may be found to provide novelty and an inventive step.

Why claims?

An independent claim seeks to provide a specification of a thing or a process so that a legal authority can decide whether an act infringes upon the claim. This typically means that another thing or process is deemed to fall within the specification of the thing or process in the claim.

Patents arose from legal decrees on monopolies. They started to become a legal concept in the 15th and 16th centuries. At first, the legal authority was a monarch or guild. So you can think of them as an attempt 500-odd years ago to describe a thing or process for some form of human negotiation.

A key point is that claims are inherently linguistic. The specification of a thing or a process is provided in a written form, in whatever language is used by the patent jurisdiction in question. So we are using words to specify a thing or a process in a way that allows for comparison with other things or processes.

Normally we want the specification to be as broad as possible – to cover as many different things or processes as possible so as to maximise a monopoly. But there is a tension with the requirements that a claim be novel and inventive (non-obvious). There is a dialectic process (examination) that refines the language. I want a monopoly for “a thing” (“1. A thing.”) but there are pre-existing “things” that are a problem for novelty.

So claims are not only compared with other things and processes when determining infringement, there are also compared with things and processes that were somehow available to the public prior to the filing of a patent application containing the claims.

Description and Figures

In a patent application there is also a written description and normally one or more figures. These are “extras” that help understanding and building up a context for any comparison of the claims.

If we are examining claims for novelty and inventive step, we are often comparing them with the description and figures of existing patent publications. This is because claims are typically more abstract than the written description, and the written description contains a lot more information. We are using the principle that the specific anticipates the general.

Figures are traditionally line diagrams. They started as engineering drawings and since extended to more abstract diagrams, like flowcharts for processes and system diagrams for complicated information technology equipment.

How do computers “see” claims?

A Very Brief History of Patent Information

If we want to help ourselves compare claims, either for infringement or examination, it would be good to automate some of the process. Computers are a good tool for this job.

Patent applications used to be handwritten (as were all documents). If copies were to be made, these would also be handwritten.

Later, they were printed using mechanical printing presses. The process for this used to be the arrangement of the letters and characters in a frame to form pages of text, which were then inked and pressed onto paper. Illustrations were typically originally hand-drawn, and then reproduced using etchings or lithography.

As typewriters became common in the 20th century, patent specifications were typed from handwritten versions or as a patent attorney dictated. When I started in the profession in 2005, there were still “secretaries” that typed up letters and patent specifications.

Computers came rather late to the patent profession. It was only in the 1990s they started entering into the office and it was only in the 21st century that word processors finally replaced physical type and paper.

We still refer to “patent publications” and there is a well-trodden legal process for publication. This was because it used to take a lot of work to publish a patent specification. This seems strange in an age when anyone can publish anything in seconds at the click of a button.

From Physical to Digital

Computers are actually closer to their analogue cousins than we normally realise.

At a basic level, a text document of a set of patent claims comprises a sequence of character encodings. Each character encoding is a sequence of bits (values of 0 and 1). A character is selected from a set that includes lower case letters, upper case letters, and numbers. Normally there is a big dictionary of numbers associated with each character. You can think of a character as anything that is either printed or controls the printing. In the past, characters would be printed by selecting a particular block with a carving or engraving of the two-dimensional visual pattern that represents the character. If you imagine a box of blocks, where each block is numbered, that’s pretty much how character encoding works in a computer.

For example, the patent claim – “1. A thing.” is 049 046 032 065 032 116 104 105 110 103 046 in a decimal representation of the ASCII encoding. This can then be converted into its binary equivalent, e.g. 00110001 00101110 00100000 01000001 00100000 01110100 01101000 01101001 01101110 01100111 00101110. In an actual character sequence, there is typically no delimiting character (“space” is still just a character), so what you have is 0011000100101110001000000100000100100000011101000110100001101001011011100110011100101110. What bits relate to which character is determined based on fixed-length partitioning.

Another hangover from mechanical printing and typing is that many of the control and spacing characters are digital versions old mechanical commands. For example, “carriage return” doesn’t really make sense inside a computer, a computer doesn’t have a carriage. However, a typewriter has a carriage that pings forwards and backwards. Similarly, the “tab” character is a short cut for those on typewriters having to type tables. Any actual text thus contains not only the references to the letters used to form the words, but also the control characters that dictate the whitespace and file structure.

A sequence of character encodings is typically referred to as a “string” (from the mental image of beads on a string). This may be stored or transmitted. Word processors store character encodings in a more complex digital wrapper. Microsoft Word rather silently shifted a decade ago from a proprietary wrapper to a more open extended mark-up (XML) format (which is why you have all those different options for saving Office files). A modern Word file is actually a zip file of XML files.

Things get more confusing when we consider the digital replacement for physical prints – PDF files. PDF files are different beasts from word processing files. They are concerned with defining the layout of elements within a displayed document. While both word processing documents and PDF files store strings of text somewhere underneath the wrapping, the wrapping is quite different.

What does this all mean for my claims?

It means that much of the linguistic structure we perceive in a written patent claim exists in our heads rather than in the digital medium.

The digital medium just stores sequences of character encodings. A digital representation of a patent claim does not even contain a machine representation of “words”.

This still confuses many people. They assume that “words” and even sometimes the semantic meaning exist “somewhere” in the computer. They assume that the computer has a concept of “words” and so can compute with “words”. This was false…until a few years ago.

Comparing claims using computers

Traditional Patent Searching

Patent searching can be thought of as a way of comparing a patent claim with a body of prior publication documents. You can see the limitations of traditional computer representations of text when you consider patent searching.

Most digital patent searching, at least that developed prior to 2020ish, is based on key word matching. This works because it does not need the computer to understand language. All it consists of is character sequence matching.

For example, if you are looking for a “thing”, you type in “thing”. This gets converted into a binary sequence of bits. The computer then searches through portions of encoded text looking for a matching binary sequence of bits. It’s a simple seek exercise. It’s also slow and fragile – “entity” or “widget” can pretty much have the same meaning but will not be located.

Now there are some tricks to speed up keyword matching on large documents. You can do a simple form of tokenisation by splitting character sequences on whitespace characters (e.g., a defined list of character encodings that define spaces, full stops, or line returns). These represent words 80-90% of the time but there are lots of issues (compare splitting on ” ” and “.” for “A thing.” and “This is 3.3 ml.”). The resulting character sequences following the split can then be counted. This is called “indexing” the text. This then has the power of reducing the text to a “bag of words” – the “index”. It turns out that lots of words are repeated used (e.g., “it”, “the”, “a” etc.). The bag of words, represented as a set of unique character sequences, thus has much fewer entries that the complete text. You can also ignore words that don’t help (called “stopwords”, they are normally chosen to exclude “a”, “the”, “there” and other high-frequency words). The “index” can thus be much more quickly searched for character sequence matches. (This ignores most of the very clever real world optimisations for keyword searching in large databases but is roughly still how things work so stay with me.)

Now, key word searching is only a rough proxy for a claim comparison.

If you try to search the complete character sequence of the claim against all the patents every published, it is very likely you will not find a match. This is because the longer the sequence of characters, the more unique it is. You would only find a match in Borge’s library. The Google PageRank claim is around 600 characters. You would need to find a string with 600 characters arranged identically. And you would not match against semantically identical descriptions in prior publications that just used a different punctuation character encoding somewhere amongst those 600 characters (don’t get me started on hyphens).

Multiple term key word searching typically involves picking multiple key words from the claim we wish to compare and doing a big AND query, looking for all those words to have matches with a body of text. Even more complex approaches such as “within 5” typically just perform a normal character match then look for another match in a subset of character encodings either side of the character match.

How Patent Attorneys Compare Claims

Patent attorneys learn their skill through repeatedly working with patents over many years, typically decades. It’s a rather unique and niche skill. But often it’s one that cannot be easily explained or formalised.

That’s why it’s always a useful exercise to image explaining what you do to a lay person. Your gran or a five-year old.

When I was first training as a patent attorney, coming from a science and engineering background, I did think there was a “right” way of comparing patent claims and that it was just a matter of learning this and applying it. For the law, you quickly realise that this isn’t how things work. Training typically consisted of working with a skilled qualified attorney and watching how they did things. And then seeking to rationalise those things into a general scheme. It’s much more of a dark art. After working with many different attorneys, you realise there is lots of stylistic variation. You realise the courts often have an intuitive feel for what is right, and this is used to guide a rationalised logic process within the bounds of previous attempts. The rationalised logic is what you end up with (the court report), while the intuitive feeling often hides in plain sight.

Anyway, claim comparison is typically split into the following process:

  • Construe the claim
  • Split the claim into features
  • Match features
  • Look at the number of matched features

If all the features match in a way that is agreed by everyone then the claim covers an infringement or the claim is anticipated by prior art.

Construe the claim

“Construing” a claim is shorthand for interpreting the terms within the claim. Typically, it concentrates on areas of the claim that may be unclear, or are open to different interpretations. For example, a claim could have a typo or error, or a term might have multiple meanings that cover different groups of things.

Construing the claim is typically performed early on as it allows multiple parties to have a consistent interpretation of the text. It is thus needed before any matching is performed. It is often presented as an exercise that is “independent” of the later stages of the comparison. However, in practice, construction is performed with an eye on the comparison – if the infringement or prior art revolves around whether a particular feature is present (e.g., does it have a “wheel”?) then the terms that describe that feature have greater weight when construing (e.g., what is a “wheel”?).

Claim construction is something that is hard to translate to an automated analysis. It involves having parties to a disagreement agree on a set of premises or context. It thus naturally involves mental models within the minds of multiple groups of people, people that have a vested interest in an interpretation one way or another.

Where there is disagreement, the description and figures are typically used as an information source for resolving error and ambiguity. For example, if the description and figures clearly state that “tracked propulsion” is “not a wheel”, then it would be hard for a party to argue that “wheel” covers “tracked propulsion”. Similarly, if the claim refers to a “winjet” and the description consistently describes a “widget”, then it seems clear “winjet” is a typo and what was meant was “widget”.

Claim construction can also be seen as making the implicit, explicit. Certain terms in a claim may be deemed to have a minimum number of properties or attributes. These may be based on the “common general knowledge” as represented by sources such as textbooks or dictionaries. These can be taken as a “base line” that are then modified by any explicit specification in the claim or description and figures. Again, if the parties agree that both objects of comparison have an X, there is little reason to go into this level of detail. It is mainly terms about which the comparison turns that undergo this analysis. These terms are typically those where there is the greatest difference between the parties and the strongest arguments. One of the roles of the courts or the patent examination bodies is to shape the argument so that points of agreement can be quickly be admitted, and the differing points number a reasonable amount. (If there are lots of differences, and many of these, on the face of it, are supported, it is difficult to bring a case or find agreement within the authority; if there are no differences that are contested, the case is typically easy to bring to summary judgement.)

When construing the claim, prior decisions of the courts can also be brought to bear. If a higher court rules that using the phrase “X” has a particular interpretation, this can be applied in the present case.

Split the Claim into Features

What are claim “features”?

Here we can go back to our original split between “things” and “processes”.

“Things” are deemed to have a static instantiation (whether physical or digital). Things are deemed to be composed of other things: systems have different system components, physical mechanic devices have different parts, and chemical compositions have different molecular and/or atomic constituents.

“Processes” are a set of events that unfold in time, typically sequentially. They are often methods that involve a set of steps. Each step may be seen as a different action and/or configured state of things and matter.

When we are looking at claim “features”, we are looking to segment the text of the claim into sub-portions that we can consider individually. Psychologically, we are looking to “chunk” the content of the claim. We chunk because our working memories are limited. When comparing we need to hold one chunk in the working memory, and compare it with one or more other chunks. Our brains can hold a sequence of about three or four “chunks” in working memory at any one time, or hold two items for comparison. We decompose the claim into features as a way to work out if a match exists – we can say the whole matches if each of the parts match.

Now, we only need to break a claim into features because it is complex. If the claim was “1. A bicycle.”, we could likely hold the whole claim in our working memories and compare it with other entities. In this case, we might need to use the previous step of claim construction to determine what the minimum properties of a “bicycle” were. (Two wheels? Is a tricycle, a motorcycle, or a unicycle a “bicycle”?). Here we see that the definition of claim features can be a recursive process, where the depth of recursion into both explicit and implicit features depends on the level of disagreement between parties (and likelihood of collective agreement between different parties within an authority, such as between a primary examiner and senior examiner). Recursion can also be used to “zoom in” on a particular feature comparison, while then concluding on a match at a “zoomed out” level of the feature (e.g., this does match a bicycle because X is a first wheel and Y is a second wheel).

A Short Aside on Segmentation

Claim feature extraction is a form of semantic segmentation.

Segmentation in images made a huge leap in 2023 with the launch of Meta’s Segment Anything model. In images, segmentation is often an act of determining a context-dependent pixel boundary in two-dimensions.

For the sequence of characters that form a patent claim, we have a one-dimensional problem. We need to determine the “feature” breakpoints in the sequence of characters.

It turns out patent attorneys provide clues as to this semantic segmentation via the use of whitespace. Patent attorneys will often add whitespace such that the claim is partitioned into pseudo-features by way of the two-dimensional lay out.

In the example above we see that commas, semi-colons, and new lines break the patent claim into five natural “features”.

It turns out there are a number of problems with the reliability of automated segmentation based on whitespace:

  • The text is often transformed when it is loaded into different systems, meaning that original white space may be lost or omitted. Fairly often new lines are stripped out, or stripped out then manually replaced.
  • There are many different encodings of many different forms of whitespace – they are multiple versions of the new line character for example.
  • Real-world patent claims often have a multi-tier nested structure that requires more advanced recursive segmentation.
Things or Events as Features

Those familiar with patent law will realise that when someone refers to “claim features”, they are normally referring to portions of text within the claim that are indicated as separate sections by the author’s use of whitespace. Claim charts are tables that often have 5-10 rows, where each row is a feature that is a different portion of the claim text determined in this way. Claim charts normally are structured to fill up one page of A4, so we can easily get an idea of the feature matches.

However, we can ask a deeper question – what are those different whitespace separately portions of the claim actually representing?

Or put another way – what do we mean by semantic segmentation of the claim text?

Let’s have a look at the simple WIPO claim example above. Using new lines we can split that into the following features:

  • [a]n apparatus (, comprising:)
  • a plurality of printed pages;
  • a binding configured to hold the printed pages together;
  • a cover attached to the binding,
  • characterized in that, the cover is detachable from the binding.

Looking closely, we see that actually those text portions are centred on different things. The claim defines an “apparatus“, that forms the top line. This apparatus has a number of components: pages, a binding, and a cover. We see that the middle three segments are based around definitions of each of these components. The last section then defines a characteristic of the apparatus in terms of the cover and binding components.

So for a “thing” claim, we see that our semantic focus for segmentation is “sub-things”. “Things” are made of interconnected “sub-things” and this pattern repeats recursively. We can look at different “things” or “sub-things” in isolation of its connections to focus on it’s individual properties. Things at each level are defined by the interconnection and inter-configuration of sub-things at a lower level.

Now in English grammar, we have a term for “things”: nouns. Nouns and noun-phrases are the terms we use to classify the location of “things” in text. So when we semantically segment a claim, we are doing this based on the noun content of the claim.

Method claims are slightly different. We no longer have a subdivision by static structural “things”. Rather we have a partition by time, or more precisely different sequences of actions within time. Take another claim example from WIPO:

If you were to ask a patent attorney to split that claim into “features”, they would likely choose each step – i.e. each clause starting on a new line and ending with a semi-colon and new line:

  • [a] process for producing fried rice (, comprising the steps of:)
  • turning the heat source on;
  • cooking rice in water over the heat source for a predetermined period;
  • placing a predetermined amount of oil in a pan;
  • cooking other ingredients and seasoning in the pan over the heat source;
  • placing the cooked rice in the pan; (and)
  • stirring consistently the rice and the other ingredients for a predetermined length of time over the heat source.

These steps are different actions in time, where time runs sequentially across the steps.

Now we can see that method claims also share certain aspects of the “thing” claims. We have several “things” that are acted on in the method, including: “fried rice”, “heat source”, “rice”, “oil”, “pan”, “ingredients”, “seasoning”, “cooked rice”, and “length of time”. We can also see that some of those “things” are actually different states of the same thing – we start with “rice”, which then becomes “cooked rice”, which is output by the method as “fried rice”.

Even though a method consisting of: “turning”, “cooking”, “placing”, “cooking”, “placing”, and “stirring” would be a valid patent claim, it would likely lack novelty. For example, the quite different method of cooking a chicken dinner below would fall within that method:

  • turning a chicken breast in flour;
  • cooking a set of potatoes in water;
  • placing the chicken breast and cooked potatoes on baking trays;
  • cooking the chicken breast and potatoes in the oven;
  • placing the cooked chicken breast and potatoes on a plate; and
  • stirring gravy to pour over the plate.

So we see that it is the things that are involved in each step that define the (sub) features of the step.

Match Features

Once we have identified features in the claim the next step is comparing each of those features. For infringement, we are comparing with a possibly infringing thing or process. For examination, we are comparing with a prior publication.

Splitting a claim into features lessens the cognitive load of the comparison. It also allows agreement on uncontentious aspects, focusing effort on key points of disagreement. Much of the time, there is only really one feature that may or may not differ. Often one missing feature is all you need to avoid infringement and/or argue for an inventive step.

Now, you might say that matching is easy, just like spot the difference.

Going back to an image analogy, visual features may be segmented portions of a two-dimensional extent. In spot the difference we compare two images that are scaled to the same dimensions. We are then looking for some form of visual equivalence in the pixel patterns in different portions of the image.

Words are harder though. We are dealing with at least one level of abstraction from physical reality. We are looking for a socially agreed correspondence between two sets of words.

The facts of the case determine what features will be in contention and which may be more easily matched. Different features will be relevant for different comparisons. Inventive step considerations still involve a feature matching exercise, but they involve different feature matches in different portions of prior art.

What does it mean for a feature to match?

We have our claim feature, which is set out in a portion of the claim text (our segmented portion).

Our first challenge is to identify what we are comparing with the claim. These can sometimes be fuzzy-edged items that need to be human-defined. Sometimes they are harder-edged and more unanimously agreed upon as “things” to compare. For infringement, the comparison may be based on a written description of a defined product, or a documented procedure. For examination, it is often a prior-published patent application.

Our second challenge is to find something in the comparison item that is relevant to the particular feature. There may be multiple candidates for a match. At an early stage this might be a general component of a product or thing, or a particular component of a particular embodiment of a patent application as set out in one or more figures.

Once we have something to compare, and have identified a rough candidate correspondence, the detailed analysis of the match then depends on whether we are looking at infringement or examination for novelty.

For infringement, we have a “match” if the language of the claim feature can be said to describe a corresponding feature in the potentially infringing product or process. At this stage we can ignore the nuances of the infringement type (e.g., use vs sale), as this normally only follows if we have a clear infringing product or process. To be more precise, we have a “match” if a legal authority agrees that the language of the claim feature covers a corresponding feature in the potentially infringing product or process. So there is also a social aspect.

For the examination of novelty, we have a “match” if a portion of a written description can be said to describe all the aspects of the claim feature. As claim features are typically at a higher level of abstraction, this can also be thought of as: would an abstracted version of the written description produce a summary that is identical to the claim feature?

A match is not necessarily boolean; if there is a particular point of interpretation or ambiguity there may be numerous options to decide. A decision is made based on reason (or reasons), sometimes with an appeal to previous cases (case law) or analogy or even public policy. If you asked 100 people, you might get X deciding one way and 100-X deciding the other.

Look at the number of matched features

This is normally the easy part. If we have iterated through our “matching” for each identified claim feature, and the set of claim features exhaustively cover all of the claim text, then we simply total up the number of deemed “matches”.

If all the features match, we have a matching product or process for infringement, or our claim is anticipated by the prior art.

If any feature does not match, then we do not have infringement (ignoring for now legal “in-filling” possibilities) and our claim has novelty, with the non-matching features being the “novel” features of the claim.

Any non-matching features may then be subject to a further analysis on the grounds of inventive step. If the non-matching feature is clearly found in another document, and a skilled person would seek out that other document and combine teachings with no effort, then the non-matching feature is said to lack an inventive step.

Can we automate?

Given the above, we can ask the valid question: can we automate the process?

The answer used to be “no”. The best we could do was to compare strings, and we’ve seen above that any different in surface form of the string (including synonyms or differences in spelling or white space) would throw out an analysis of even single words.

Fuzzy matching

Before 2010, natural language processing (NLP) engineers tried tinkering with a number of approaches to match words. This normally fell within the area of “fuzzy matching”. An approach used since the 60s is calculating the Levenshtein distance, a measure of the minimum number of single-character edits that change one word into another. This could catch small typos but still needed a rough string match. It failed with synonyms and irregular verbs.

word2vec

In the early 2010s though, things began to change. Techniques such as word2vec were developed. This allowed researches to replace a string version of a word with a list of floating point numbers. These numbers represented latent properties of use of the string in a large corpus of documents. Or put another way, we could compare words using numbers.

Early word vector approaches offered the possibility of comparing words with the same meaning but different string patterns. In particular, we found that the vectors representing the words had some cool geometric properties – as we moved within the high-dimensional vector space we saw natural transitions of meaning. Words with similar meanings (i.e., that were used in similar ways in large corpora) had vectors that were nearby in vector space.

So words such as “server” and “computer” might be matched in a claim based on their word2vec vectors despite there being no string match. I remember playing with this using the gensim library.

Transformers

We didn’t know it at the time, but word vectors were the beginning not the end of NLP magic.

In the early days, word embeddings allowed us to create numerical representations of word tokens for input into more complex neural network architectures. At first you could generate embeddings for a dictionary of 100,000 words, giving you a matrix of vector-size x 100k and you could then select your inputs based on a classic whitespace tokenisation of the text.

Quickly, people realised that actually you didn’t need the word2vec as a separate process, but you could learn that matrix of embeddings as part of your neural architecture. Sequence to sequence models were built on top of recurrent neural network architectures. Then in 2017, Attention is All You Need came along, which turbo-charged the transformer revolution. Fairly quickly in 2018 we arrived at BERT, which was the encoder side of AIAYN and was built into many NLP pipelines as a magical classifying workhorse, and GPT, the foundation model that became the infamous ChatGPT. In 2023, we saw the public release of GPT4, which took language models from an interesting toy for making you sound like a pirate to possible production language computer. In 2024, we are still struggling to get anywhere near to the abilities of GPT4.

With large language models like BERT and GPT, you get embeddings of any text “for free” – it’s a first stage of the model. We can thus now embed longer strings of text and convert it into a vector representation. These vectors can then be compared using mathematics. STEM – 1 ; humanities – 0 (just don’t take a close look at society).

Settlers in a Brave New World

The power of word embeddings and large language models now open up whole new avenues of “legal word processing” that were previously unimaginable. We’ve touched on using retrieval augmented generation here and here. We can apply the same approaches to patent documents and claims.

We now have a form of computer that takes a text input and produces a text output. We don’t quite know how it works but it seems to pass the Turing Test, while reasoning in a slightly stunted and alien manner.

This then provides the opportunity to automate the process described above, to arrive at automated infringement and novelty opinions. At scale. While we sleep. For pennies.

I’m excited.

Talking Legislation – Asking the Patents Act

We all are told that Large Language Models (LLMs) such as ChatGPT are prone to “hallucinations”. But did you know we can build systems that actively help to reduce or avoid this behaviour?

In this post, we’ll be looking at build a proof-of-concept legal Retrieval-Augmented Generation (RAG) system. In simple terms, it’s an LLM generative system that cites sources for its answers. We’ll look at applying it to some UK patent legislation.

(Caveat: I have again used GPT-4 to help with speeding up this blog post. The rubbish bits are its input.)

Scroll down to the bottom if you want to skip the implementation details and just look at the results.

If you just want to have a look at the code, you can find that here: https://github.com/Simibrum/talking_legislation

Introduction

The complex and often convoluted nature of legislation and legal texts makes them a challenging read for both laypeople and professionals alike. With the release of highly capable LLMs like GPT-4, more people have been using them to answer legal queries in a conversational manner. But there is a great risk attached – even capable LLMs are not immune to ‘hallucinations’ – spurious or inaccurate information.

What if we could build a system that not only converses with us but also cites its sources?

Enter Retrieval-Augmented Generation (RAG), a state-of-the-art technology that combines the best of both worlds: the text-generating capabilities of LLMs and the credibility of cited sources.

Challenges

Getting the Legislation

The first hurdle is obtaining the legislation in a format that’s both accurate and machine-readable.

Originally the official version of a particular piece of legislation was the version that was physically printed by a particular authority (such as the Queen or King’s printers). In the last 20 years, the law has mostly moved onto PDF versions of this printed legislation. While originally digital scans, most modern pieces of legislation are available as a digitally generated PDF.

PDF documents have problems though.

  • They are a nightmare to machine-read.
  • Older scanned legislation needs to be converted into text using Optical Character Recognition (OCR). This is slow and introduces errors.
  • Even if we have digital representations of the text within a PDF, these representations are structured for display rather than information extraction. This makes it exceedingly difficult to extract structured information that is properly ordered and labelled.

Building the RAG Architecture

Implementing a RAG system is no small feat; it involves complex machine learning models, a well-designed architecture, and considerable computational resources.

Building a Web Interface

The user experience is crucial. A web interface has to be intuitive while being capable of handling the often lengthy generative timespans that come with running complex models.

Solutions

Using XML from an Online Source

In the UK, we have the great resource: www.legislation.gov.uk.

Many lawyers use this to view up-to-date legislation. What many don’t know though is it has a hidden XML data layer that provides all the information that is rendered within the website. This is a perfect machine-readable source.

Custom XML Parser

Even though we have a good source of machine-readable information, it doesn’t mean we have the information in a useful format for our RAG system.

Most current RAG systems expect “documents” to be provided as chunks of text (“strings” – very 1984). For legislation, the text of each section makes a good “document”. The problem is that the XML does not provide a clean portion of text as you see it on-screen:

Rather, the text is split up across different XML tags with useful accompanying metadata:

To convert the XML into a useful Python data structure, we need to build a custom XML parser. This turns the retrieved XML into text objects along with their metadata, making it easier to reference and cite the legislative sources. As with any markup processing, the excellent Beautiful Soup library is our friend. The final solution requires some recursive parsing of the structure. This always makes my head hurt and requires several attempts to get it working.

Langchain for Embeddings and RAG Architecture

This mini project provided a great excuse to check out the Langchain library in Python. I’d seen many use this on Twitter to quickly spin up proof-of-concept solutions around LLMs.

At first I was skeptical. The power of langchain is it does a lot with a few lines of code, but this also means you are putting yourself in the hands of the coding gods (or community). Sometimes the abstractions are counter-productive and dangerous. However, in this case I wanted to get something up-and-running quickly for evaluation so I was happy to talk on the risks.

This is pretty bleeding edge in technology terms. I found a couple of excellent blog posts detailing how you can build a RAG system with langchain. Both are only from late August 2023!

The general outline of the system is as follows:

  • Configure a local data store as a cache for your generated embeddings.
  • Configure the model you want to use to generate the embeddings.
    • OpenAI embeddings are good if you have the API setup and are okay with the few pence it costs to generate them. The benefit of OpenAI embeddings is you don’t need a GPU to run the embedding model (and so you can deploy into the cloud).
    • HuggingFace embeddings that implement the sentence-transformer model are a free alternative that work just as well and are very quick on a GPU machine. They are a bit slow though for a CPU deployment.
  • Configure an LLM that you want to use to answer a supplied query. I used the OpenAI Chat model with GPT3.5 for this project.
  • Configure a vector store based on the embedding model and a set of “documents”. This also provides built-in similarity functions.
  • And finally, configure a Retrieval Question-and-Answer model with the initialised LLM and the vector store.

You then simply provide the Retrieval Question-and-Answer model with a query string, wait a few seconds, then receive an answer from the LLM with a set of “documents” as sources.

Web Interface

Now you can run the RAG system as a purely command-line application. But that’s a bit boring.

Instead, I now like to build web-apps for my user interfaces. This means you can easily launch later on the Internet and also take advantage of a whole range of open-source web technologies.

Many Python projects start with Flask to power a web interface. However, Flask is not great for asynchronous websites with lots of user interaction. LLM based systems have the added problem of processing times in the seconds thanks to remote API calls (e.g., to OpenAI) and/or computationally intensive neural-network forward passes.

If you need a responsive website that can cope with long asynchronous calls, the best framework for me these days is React on the frontend and FastAPI on the backend. I hadn’t used React for a while so the project was a good excuse to refresh my skills. Being more of a backend person, I found having GPT-4 on call was very helpful. (But even the best “AI” struggles with the complexity of Javascript frontends!)

I also like to use Bootstrap as a base for styling. It enables you to create great-looking user interface components with little effort.

Docker

If you have a frontend and a backend (and possibly a task queue), you need to enter the realm of Docker and Docker Compose. This helps with managing what is in effect a whole network of interacting computers. It also means you can deploy easily.

WebSockets

I asked ChatGPT for some suggestions on how to manage slow backend processes:

I’d built systems with both async functionality and task queues, so thought I might experiment with WebSockets for this proof-of-concept. As ChatGPT says:

Or a case of building a TCP-like system on-top of HTTP to overcome the disadvantages of the stateless benefits of HTTP! (I’m still scared by CORBA – useful: never.)

Anyway, the WebSockets implementation was a pretty simple setup. The React front end App sets up a WebSocket connection when the user enters a query:

And this is received by an asynchronous backend endpoint within the FastAPI implementation:

Results and Observations

Here are some examples of running queries against the proof-of-concept system. I think it works really well – especially as I’m only running the “less able” GPT3.5 model. However, there are a few failure cases and these are interesting to review.

Infringement

Here’s a question on infringement. The vector search selects the right section of the legislation and GPT3.5 does a fair job of summarising the long detail.

We can compare this with a vanilla query to GPT3.5-turbo:

And to a vanilla query using GPT4:

Inventors

Here’s an example question regarding the inventors:

Again, the vector search finds us the right section and GPT-3.5 summarises it well. You’ll see GPT3.5 also integrates pertinent details from several relevant sections. You can also click through on the cited section and be taken to the actual legislation.

Here’s vanilla GPT3.5:

Failure Case – Crown Use

Here’s an interesting failure case – we ask a question about Crown Use. Here, the vector search is biased to returning a shorter section (122) relating to the sale of forfeited items. We find that section 55 that relates to Crown Use does not even feature in the top 4 returned sections (but would possibly be number 5 given that section 56 is the fourth entry).

Interestingly, this is a case where vanilla GPT3.5 actually performs better:

WebSocket Example

If you are interested in the dynamics of the WebSockets (I know all you lawyers are), here’s the console log as we create a websocket connection and fire off a query:

And here’s the backend log:

Future Work

There are a few avenues for future improvement:

  • Experiment with the more expensive GPT4 model for question answering.
  • Extend the number of returned sources.
  • Add an intermediate review stage (possibly using the cheaper GPT3.5).
  • Add some “agent-like” behaviour – e.g. before returning an answer, use an LLM to consider whether the question is well-formed or requires further information/content from the user.
  • Add the Patent Rules in tandem.
  • Use a conventional LLM query in parallel to steer output review (e.g., an ensemble approach would maybe resolve the “Crown Use” issue above).
  • Add an HTML parser and implement on the European Patent Convention (EPC).

Summary

In summary, then:

Positives

  • It seems to work really well!
  • The proof-of-concept uses the “lesser” GPT3.5-turbo model but often has good results.
  • The cited sources add a layer of trust and verifiability.
  • Vector search is not perfect but is much, much better than conventional keyword search (I’m glad it’s *finally* becoming a thing).
  • It’s cool being able to build systems like this for yourself – you get a glimpse of the future before it arrives. I’ve worked with information retrieval systems for decades and LLMs have definitely unlocked a whole cornucopia of useful solutions.

Negatives

  • Despite citing sources, LLMs can still misinterpret them.
  • The number of returned sources is a parameter that can significantly influence the system’s output.
  • Current vector search algorithms tend to focus more on (fuzzy) keyword matching rather than the utility of the returned information, leaving room for further refinement.

Given I could create a capable system in a couple of days, I’m sure we’ll see this approach everywhere within a year or so. Just think what you could do with a team of engineers and developers!

(If anyone is interested in building out a system, please feel free to get in touch via LinkedIn, Twitter, or GitHub using the links above.)

Can Simulations be Patented in Europe? An Update

A Review of Oral Proceedings for G1/19

On 15 July 2020, the European Patent Office (the “EPO”) held Oral Proceedings for the pending G1/19 referral to the Enlarged Board of Appeal. Although no decision was reached by the Enlarged Board, they indicated their preliminary opinion on the referred questions and gave the Appellant and the EPO a chance to present arguments.

As a potential first, the Oral Proceedings were live-streamed due to the Covid-19 restrictions, which meant that those at home or in the office could attend. About 1600 had registered beforehand to view.

Question Under Consideration

The Enlarged Board is basically considering the question: can you patent (computer) simulations in Europe?

The questions are referred from T0489/14. This is a Board of Appeal case where the subject matter related to the simulated movement of pedestrians through a building. The initial referral was covered in this blogpost: G1/19 – Enlarged Board of Appeal to Consider the Matrix.

One of the issues thrown up by the appeal case is the status of T1227/05. The Board in T0489/14 were inclined to find that the pedestrian movement claim lacked inventive step, but felt this would conflict with T1227/05, which found that a claim to simulating noise when designing chips was patentable.

For reference for the discussion below, I will briefly repeat the referred questions, using the numbering convention of the Enlarged Board:

1. In the assessment of inventive step, can the computer-implemented simulation of a technical system or process solve a technical problem by producing a technical effect which goes beyond the simulation’s implementation on a computer, if the computer-implemented simulation is claimed as such?

2. a) If the answer to the first question is yes, what are the relevant criteria for assessing whether a computer-implemented simulation claimed as such solves a technical problem?

2. b) In particular, is it a sufficient condition that the simulation is based, at least in part, on technical principles underlying the simulated system or process?

3) What are the answers to the first and second questions if the computer-implemented simulation is claimed as part of a design process, in particular for verifying a design?

Quick Initial Answers

Below I set out some of the points that were raised during the Oral Proceedings. However, if you are looking for the quick outcomes I will indulge you.

Can you patent (computer) simulations in Europe? The answer so far appears to be “yes” – the discussion suggests that simulations are to be treated as per any other computer-implemented invention.

Referred Q1: Yes – basically as set out above.

Referred Q2a): Side-stepped by a likely finding of inadmissible.

Referred Q2b): No – something more than technical principles is required – likely a technical effect and a solution to a technical problem as per conventional computer-implemented inventions.

Referred Q3): The form of the claim does not matter.

We will need to wait for the final decision of the Enlarged Board to confirm these initial indications, and it is not impossible for their position to change after having considered the presentations in Oral Proceedings.

Amicus Curaie

Since the initial referral, 23 amicus curiae briefs have been filed. These can be read here.

Initial Enlarged Board of Appeal Opinion

On 22 June 2020, the Enlarged Board issued an initial communication that set out some of their thinking (see the first 22.06.2020 entry here). In particular, the communication set out a number of questions that the Enlarged Board felt had been raised by the referral.

Firstly, the board considered numerical simulations that have no direct link to physical reality, such as a change in or a measurement of a physical entity. They summarised two approaches as presented in the amicus curaie:

  1. A technical system or process is simulated in a manner which sufficiently corresponds to the behaviour of the real system or process – in this case “virtual” or “potential” technical effects may be considered like “real” technical effects when applying COMVIK.
  2. The numerical result of a simulation serves a technical purpose. This matches the decision in T1227/05, where the subject matter was patentable as it was directed to a tool for circuit design.

It was mentioned that technical considerations and/or the technical skills of the relevant skilled person could provide guidance as to what was patentable.

The Enlarged Board then set out a number of questions for discussion at oral proceedings:

  • Can COMVIK be used for computer simulation? [Possibly rhetorical.]
  • If we assume COMVIK can be applied:
    • Can “potential” or “virtual” technical effects be treated as “real” technical effects?
    • How do “mental acts” and “discoveries, scientific theories and mathematical methods”, as set out in Article 52(2)(a) and (c), interact with simulations?
    • Does the technical purpose need to be in the claim? [Possibly rhetorical.]
  • Is there a difference between simulations based on “human behaviour or activities” and simulations based on “natural phenomena”?

We will look again at the answers to these at the end of this post.

Admissibility

The good news is that the Enlarged Board indicated (unofficially) that the referral was admissible, mostly.

The Enlarged Board appeared happy with the admissibility of questions 1), 2b) and 3). They did, however, have reservations about the admissibility of question 2a). Questions 1), 2b) and 3) can mostly be answered with simple “yes/no” responses, whereas question 2a) asks for a “list of criteria”. It seems the Enlarged Board are (maybe wisely) side-stepping those criteria.

The EPO was of the opinion that all the questions were admissible, as both of the two alternatives set out in Article 112 EPC are met: there is a question of the uniform application of law via the purported deviation of T1227/05 and the issue is of fundamental importance as it may affect the ever-growing fields of machine learning and artificial intelligence.

Deviation from T1227/05

The Enlarged Board also gave a hint as to their thinking on any potential “non-uniform” application of the law based on T1227/05. They noted that there may not be as much deviation as initially suggested. They believe that T1227/05 does not provide blanket support for the patentability of simulation methods. Within their discussion of “functional technical features” it is inferred there is support for both the finding that a claim to a simulation may have technical features and the finding that a claim to a simulation may lack technical features. The EPO’s representative also indicated that there were some initial versions of the claims in T1227/05 that were subsequently withdrawn that were found not to provide an inventive step.

Dynamic Patentability & COMVIK

Although now rather old-hat, it was nice to see confirmation that the EPO and practitioners are generally happy with the way patentability is assessed for computer-implemented inventions. The present approach, based on COMVIK, is stable and stands up well when applied to deep learning inventions. This field hardly existed as a field of technology a decade ago but now makes up a substantial part of many inventions in both traditional and non-traditional engineering areas. It is better to define “technical” dynamically via the case law as what is an “invention” changes with time.

Indeed, “simulation” appears to have touched a nerve with the EPO and the filers of the amicus curiae briefs – it’s a Trojan horse for wider questions raised by machine learning and modern numerical methods. Most want these forms of inventions and methods to be protectable.

With one eye on the future, I was also please that the EPO implied that a link to direct measurement or physical reality was possibly not needed. The EPO stressed that the trend is towards increasing digitalisation, and an increased blurring of the line between real and simulated. Problems may be avoided if the EPO approach sticks to the key issue of whether a technical problem is solved and whether there is a technical purpose. 

COMVIK also confirms that the mere fact that a feature relates to a specifically excluded field does not mean it cannot provide a technical effect. It may provide a technical effect in the context of the whole invention if a technical problem solved. 

“Technical Simulations”

Many discussions on patentable subject matter descend into a circular definition of the term “technical”. The EPO made some great points as to why this was needed (see the Dynamic Patentability section below) but also presented some possible routes out of the maze.

One definition from the EPO that I liked was that a “technical” simulation is a “useful” simulation. This was then deemed to require an accurate and realistic model of a technical system that provides useful design information to an engineer.  Such a simulation would include “technical considerations” if it reflected (at least) in part technical attributes of the simulated system.

The EPO also suggested that one condition for a “technical” simulation is that it needs to directly provide technical data as opposed to more generic “data” that needs to be further analysed by a user to derive technical data. It provides an option to find that general pedestrian modelling lacks suitable technical features as it does not provide concrete parameters that are useable for building design. But it does raise the suggestion that a claim may be patentable if specific building parameters were provided as an output. 

The Appellant raised a good counterpoint – that sometime negative determinations from a simulation are technically useful. For example, if the simulation says not to build a particular chip or building, surely these negative decisions are as important as active positive decisions? 

One suggestion for sufficient conditions as per question 2 from the EPO was that the object of the simulation was limited to a technical system and that the output of the simulation provides technical information relating to that technical system. Cases where these conditions are met may be easier to deem sufficiently “technical”. If is found that it is not a sufficient condition that the simulation is based, at least in part, on technical principles underlying the simulated system or process, then another condition could be that the output relates to technical properties of the system that is simulated, e.g. parameters relating to the performance of the simulated system. 

The problem then mainly arises for more generic claims to simulations of “technical systems” without any clear technical output relating to the system being modelled. It was suggested that this requires more consideration of “technical purpose” and that this “technical purpose” needed to be present in the claim.

“Technical Purpose”

The concept of “technical purpose” came up a lot in discussions, and formed a core part of the Appellant’s arguments.

The basic premise was that if a simulation method had a “technical purpose” it could be patentable. The Appellant offered the concept of “technical purpose” as a way out of the difficulty of defining “physical entities” within “physical realities”. They stressed that a simulation is a tool and that we should look at purpose of that tool.

I am not sure the Enlarged Board will be happy going too far down this path.

For example, how do you reconcile a “technical purpose” argument with later infringement? It seems silly to need to rely on a separate nebulous concept when deciding whether a claim infringes. It would thus seem necessary to include any “technical purpose” within the claim, ideally as clear limitations on the claim. In this case, can you not just look at the claim language rather than requiring an external concept. For example, if a claim is directed to “modelling a building design” and comprises a “model of a building”, you can talk about a “technical purpose” of “building design”.

Later the Appellant raised an example of weather simulation. This is generally deemed to be non-patentable as you cannot control the weather. However, if you have a blind control system that uses a weather simulation, this appears patentable. The Appellant tried to link this to the “technical purpose” arguments. However, it seems more to indicate that the field needs to be defined in the claim – the general weather simulation is not patentable but use of the same in a field of technology is patentable. This kind of argument basically shifts the focus: is the field of building design suitably technical? This is the million-dollar question all parties appeared to be ducking.

“Potential” and “Virtual” Effects

The EPO provided the best (and only) definition of “potential” and “virtual” effects, which arose in the communication from the Enlarged Board.

A “potential” technical effect was deemed to be an effect that occurs via use of the invention.

A “virtual” technical effect was deemed to be an effect that is achieved in a computer, where the effect corresponds to an effect that would be seen if the modelled process was happening in the real world. 

This can get complicated because computer program product claims can be said to provide a potential technical effect – in themselves they do not provide the technical effect but when the computer program is executed the technical effect is provided. The EPO rightly says we need to park this interpretation and not get into repeated ground from earlier G decisions. The conclusion in general is that there needs to be a concrete technical effect, but this can occur as it occurs for computer program claims.

Another interpretation of “potential” technical effects is that these relate to the Appellant’s teleological effects, i.e. they are effects that arise when the simulation is used for its technical purpose. However, this also raises difficulties, so we will see if the Enlarged Board quietly drop this language, or at least stress that we are looking for a concrete technical effect within the claim as we would for a computer program claim.

With regard to “virtual” technical effects, the EPO was of the opinion that, generally, a virtual technical effect does not solve a technical problem of itself. However, the “virtual” technical effect may be further used to provide a concrete technical effect.

An example was provided that covered direct and indirect measurements. With indirect measurements we may take raw data and compute a quantity rather than explicitly measuring it. For example, the quantity may be an estimate of a physical variable that is not measured directly but that is computed from other measurements or data. In this case, the “virtual” technical effect may be deemed to relate to the “virtual” measurement. Another example of a “virtual” technical effect provided by the EPO was the use of “virtual sensors” or “digital twins”. In this case, you may have an indirect measurement in parallel with a direct measurement.

It seemed to be implied that a further step was needed that included the use of the indirect measurements to solve a concrete technical problem for patentability, but in other cases it seemed implied that the indirect measurement itself could be patentable. We will have to see what the Enlarged Board say about this.

Aircraft Wind Tunnel Simulations

Both the EPO and the Appellant provided an example of a virtual wind tunnel. The EPO discussed how simulations of wind tunnels may be used instead of real wind tunnels to estimate and test aeronautical properties of aircraft wings. It was indicated by the EPO that a new experiment design for the testing of aircraft wings would be patentable, subject to the other requirements of novelty and inventive step, regardless of whether it was a physical or virtual experiment. Both virtual and concrete experiment designs were deemed alternative solutions to the same technical problem – and both should be patentable. This was also echoed by the Appellant who indicated it was the technical effect or technical purpose, rather than the form of the method, that mattered.

“Natural Processes” vs “Human Activities”

Within the discussions, following the questions in the communication from the Enlarged Board, a distinction between “natural processes” and “human activities” was made. Unfortunately, we got no further than an indication that this distinction may be important.

The implied tone appeared to be that “human activities” may be more likely to fall within the excluded subject matter of Article 52(2) EPC and so not be able to contribute to an inventive step in themselves. The EPO provided an example of a simulation of human agents in an online auction (a nod to T0258/03 for the fans) – while it was implied that these simulations per se may not be patentable, it was also implied that a larger claim that includes these simulation may not be unpatentable, as long as the claim solved a technical problem or had a technical purpose.

For me, the discussions of “human activities” had echoes of the “cognitive effects” discussed in the case law – stuff that relies on conscious activity within the human brain is normally assumed not to provide an inventive step.

The EPO though raised the interesting counterexample of ray tracing and rendering. This is a detailed and necessarily accurate modelling of physical or natural phenomena but where the technical effect has a perceptual aspect – the aim is to provide a lifelike rendering of an object, even though that object does not exist. I was impressed that this was explicitly raised, as it shows the Enlarged Board need to be careful of relying on an explicit link to actual physical objects or systems, or effects that exclude human minds.

This divide between “natural processes” and “human activities” has the potential to throw up many traps for the unwary. For example, when considering “physical activity”, i.e. actions in the world, where do you draw the line between “natural processes” and “human activities”? This is a key question for the present facts, as pedestrians can be considered both a human undertaking an activity and as an object within three-dimensional space.

Mental Acts, Scientific Discoveries & Mathematical Methods

We are generally taught that the exclusions of Article 52(2) EPC are equal in kind, so it was interesting to see the EPO submit that some exclusions are more exclusive than others and that the practice for considering certain “non-inventions” may differ according to the type of “non-invention”.

For example, “schemes, rules and methods for playing games or doing business” are relatively easy to evaluate – if certain claims features are deemed to relate to these exclusions then they cannot provide an inventive step and you cannot obtain a patent for a business method by just performing it on a computer. However, “discoveries, scientific theories and mathematical methods” and “schemes, rules and methods for performing mental acts” may be harder to disentangle – most inventions begin as “mental acts” and use “discoveries, scientific theories and mathematical methods”. In these latter cases, it is more important to look at the technical considerations and the technical problem that is being solved.

Examples of “features” that may be found in simulation claims include: the algorithms used, the models used, the computer-implementation, what is being modelled, and the purpose of the simulation. In this list, the EPO were of the opinion that “models” were most similar to excluded “mental acts”; however, in a simulation method, the “model” often serves a purpose. If this purpose is technical then the claim may be patentable.

It was also noted that machine learning models avoid the need for models that incorporate (directly) technical principles – they are just trained – but these appear to be valid alternative solutions and so should not be excluded from patentability.

In certain cases, claims that appeared on the surface to be “mental acts”, i.e. that could be performed mentally, could involve technical considerations and may be replicating physical experiments & measurements. For example, they may form a new experiment design for building aircraft wings. The EPO thought that applying the “paper & pencil” test (i.e. it is not patentable if you can do it with pen and paper) was not conclusive and was trumped if a technical solution could be demonstrated.

This also nicely fits in with the reason why “scientific discoveries” are excluded – they lack a “technical purpose” or fail to provide a “technical solution”. 

Practitioners will be interested to here that the EPO appeared to allow for more leeway for claims based on (but not exclusively to) “mental acts”, “mathematical methods”, “scientific theories” that concretely solve a technical problem, e.g. as compared to “business methods”.

Civil Engineering Bias?

Civil engineering is an interesting field as there is often a bias against it. For example, patent people much prefer chemistry or hard-core electronics and often feel uncomfortable with the fringes of “technology” where category definitions are “fuzzier”.

A subtext that runs through the referral, but that no-one is brave enough to tackle head on, is a question as to the “technical” nature of civil engineering; put bluntly, is civil engineering “technical” enough? 

This is where you have the potential problem with T1227/05 – this case is in a comfortably “hard” engineering field – circuit design. Without some legal gymnastics, if your gut feeling is that the claim in T0489/14 is not patentable, you seem to be saying that “noise in circuits” is a suitable “physical reality” for “technical” character but “people in a building” is not.

Of course, the irony behind this queasiness for softer fields is that the maths behind “noise” or “people” modelling is often the same! For example, differential equations, finite methods and/or digital approximations may be used for both cases. So the only difference is the use case (circuit vs building). 

Question 3 – Do You Need Any Special Claim Form?

This question was looked at in two ways: first, in the narrow manner of the question – what if the the claim is part of a (wider) design process, e.g. for verifying a design – and second, in a wider sense of whether you needed any steps before or after the “simulation” part of the claim to aid patentability.

In the second case, options that are familiar from the case law in the UK and US include a preceding step relating to an explicit “measurement” of a real-world environment and a following “control” or “material production” step that sets out an explicit use of the simulation. Although no firm decisions were made, the general tone appeared to be that neither of these steps needed to be provided, and that a claim to the core of the simulation may be considered on its own. Both the Enlarged Board and the EPO indicated they were happy with the current practice applied to computer-implemented methods. This appears good news for applicants – the “measurement” and “control” steps introduce complications for infringement, especially for server-based methods where different parties may perform different ones of these additional steps.

The EPO did raise an additional point regarding the third question – if the simulation claim as such does not meet the sufficient condition(s) required by the answer to question 2b), would embedding that claim in a wider claim (e.g. to visualisation to assist a user) fulfill the sufficient condition(s)?

Revisiting the Enlarged Board’s Questions

To finish we can look back at the questions asked by the Enlarged Board in their communication of 22 June 2020 and begin to see some answers forming.

  • Can COMVIK be used for computer simulation?
    • Appears to be “yes”.
  • If we assume COMVIK can be applied:
    • Can “potential” or “virtual” technical effects be treated as “real” technical effects?
      • “No” – something more appears to be needed.
    • How do “mental acts” and “discoveries, scientific theories and mathematical methods”, as set out in Article 52(2)(a) and (c), interact with simulations?
      • They may be present and care should be taken not to dismiss the claim just because they are present.
      • A different approach to “business methods” may be necessary.
      • Again, what matters is not their presence but that it can be demonstrated that a technical problem is being solved using the simulation (and maybe the simulation provides a technical effect beyond the technical effect of the simulation per se).
    • Does the technical purpose need to be in the claim?
      • Implied “Yes” – although not really discussed directly.
  • Is there a difference between simulations based on “human behaviour or activities” and simulations based on “natural phenomena”?
    • Implied – “Yes” – the former requiring use within a claim to provide a technical effect (over and above the simulation per se); the latter possibly providing a technical effect more as-is – but requiring some form of ultimate technical aim, effect or purpose.

Again, we will see for sure how this translates into the decision of the Enlarged Board in due course. See you next time for that!

EPO Computer-Implemented Inventions – Spring Case Law Review

As we all bed into virus lock-down, here is a quick review of a few European Patent Office Board of Appeal cases that relate to computer-implemented inventions (so-called “software patents”).

T 2272/13 – Inventory Tracking

This first case (see link here) considered a claim related to the management of a set of distributed devices. These devices included a “mother device” that needed to track a number of “satellite devices” over a Wireless Local Area Network (WLAN). The claim involved sending a “reachability request” and alerting a user if one of the “satellite devices” did not respond. The invention was described in the context of portable inventory tracking, e.g. tracking personal belongings and locating lost devices.

“Where is my phone?”

At first instance, the Examining Division refused the case for relating to a “non-technical” inventory scheme. The Board of Appeal were highly critical of the Examining Division’s decision, in the end remitting the case back to first instance for search and examination. This was despite the specification of WLAN being a choice from a set of described protocols that included RFID, infrared and Bluetooth.

This case may be of use to applicants and appellants that are facing difficulties at first instance, especially where objections assert that any “technical” features of the claim are deemed to be “notorious” or common general knowledge. The Board cited previous cases T 690/06 and T 1411/08 and asserted that the term “notorious” had to be interpreted narrowly. To prove that a technical feature is “notorious”, it is necessary for the examining division to indicate that it is “so well known that its existence at the date of priority cannot be reasonably disputed” and that the “technical detail is not significant”. The Board were also critical of the Examining Division’s assertion that the claims related to “the automation of [excluded] processes”.

The case continues a trend within the Board of Appeals to clamp down on lazy refusals from the examining divisions; the preferred approach is for at least some prior art to be cited, and for evidence-based reasons to be provided.

T 1442/16 – ECG Monitoring

This second case (see link here) was directed to medical monitoring, in particular to the collection and display of electrocardiogram (ECG) data. The main request was broader, referring to “sensors”, whereas the auxiliary request was limited to “ECG electrodes” and “ECG data”. The more concrete “ECG” auxiliary requests were analysed in detail.

My beating heart.

It was determined that the claims of the auxiliary requests differed from the closest prior art in that each axis of a set of multiaxis diagrams displayed ECG data from a respective ECG lead, and that the position and the angle of each of the axes corresponded to the location of the respective ECG lead on the patient. This allowed a three-dimensional heart model to be displayed together with the multiaxis diagrams, where each of the axes in the multiaxis diagrams extended from the centre of the heart model.

The case thus considered whether the differing features above could contribute to inventive step, or whether they related to the “non technical” presentation of information (e.g. as per Article 52(2)(d) EPC).

The appellants began by arguing that the arrangement of the axes reflected an underlying state of a technical system – the technical system being formed by the set of ECG sensors. The Board of Appeal disagreed with this – the arrangement of the data was found not to prompt the physician to interact with the ECG device or to contribute to the functioning of said device. The Board felt that the alleged technical effect relied on the user’s cognitive abilities, for example the physician’s knowledge of anatomy and the principles underlying ECG. As such the differing features set out above were deemed to be “non-technical” on the grounds that they related to the presentation of information. They were thus disregarded, and a lack of inventive step was found. A “polygonal shape” of certain auxiliary requests was also found to be “non-technical”. The appellant did manage to have some success with the last auxiliary request, where the addition of an alarm trigger moved the claim away from the presentation of information.

This case makes it clear that where sensor information is being displayed, to avoid features being dismissed as relating to the presentation of information, the underlying system being sensed needs to be a technical system; the human body is not considered to be a technical system. It appears not to have helped the appellant’s case that the arrangement of the data related to an approach that was well-known to physicians (the Cabrera system). Arguments for an inventive step will also be strengthened if the display of information facilitates interaction with the sensing device.

As an aside, the Board also indicated that just because additional features were introduced gradually, with language such as “in another embodiment”, this did not mean that the applicant has a “carte blanche to mix and combine features from different embodiments as they please”. It seems that the EPO will only accept a combination of features if an example is provided that explicitly has those features. This is something to beware of when drafting applications for Europe.

T 0247/15 – Targeted Online Advertising

The last case (see link here) concerned the selective delivery of advertisements to a plurality of online users.

Old school advertising.

In the Board’s view, “advertising” is, in general, considered a “non-technical” activity by the European Patent Office. Consequently, the content of advertisements or any effect the advertisements might have on a user’s behaviour or on the sales of a product (or a service) are not regarded as technical features or technical effects that, when present in a patent claim, can contribute to an inventive step.

In particular, the Board considered that the described effect of delivering advertisements more “efficiently” was not a technical effect, as it did not relate to a technical aim to be achieved or a technical problem to be solved by a skilled person. For example, “efficiency”, as described in the application, related to a more “efficient” scheduling of delivery of content or of “optimizing” the delivery of content. On closer inspection these were deemed to relate to the fulfilment of conditions for an advertisement campaign (i. e. assuring that content is delivered to users according to the agreement with the advertisers) and maximizing the quantity of delivered advertisements. The “efficiency” and “optimization” thus did not relate to a “technical” system, e.g. they did not consider the utilisation of technical resources of the computer network, for example. Instead, these related to contractual constraints of the advertising campaign. Selecting what type of advertisement will be delivered to a user based on their profile (e.g., a default or targeted advertisement), monitoring the delivery of advertisements and updating the delivery list accordingly, were all considered to relate to an underlying business model. The solution was thus deemed an administrative method that involved the scheduling of delivery of online advertising content, which in turn was based on an abstract mathematical model. It could not support an inventive step.

EPO Computer-Implemented Invention Round Up

A quick post that checks on the state of play for computer-implemented inventions (“software patents”) at the European Patent Office. It has a quick look at some minor updates to the Guidelines for Examination and then reviews a few recent Board of Appeal cases.


Guidelines for Examination

After the overhaul of 2018, there are relatively few updates to the Guidelines for Examination for the 1 November 2019 edition. I go through those that relate to computer-implemented inventions below. I recommend viewing the links to the sections with the “Show modifications” check box ticked.

Section G-II, 3.3.1 on “Artificial intelligence and machine learning” has been tweaked to indicate that terms such as “support vector machine”, “reasoning engine” or “neural network” do not by themselves imply a technical means. They must be considered in context (i.e. make sure you describe a “hard” engineering problem and context).

Section G-II, 3.3 on “Mathematical methods” has a few minor changes.

It is stressed that special attention needs to be paid to the clarity of terms in claims that relate to mathematical methods. If terms are deemed to have “no well-recognised meaning” this may make it difficult to demonstrate a technical character (and so care should be taken to provide detailed functional definitions within the detailed description).

It is also added that mathematical methods may produce a technical effect when applied to a field of technology and/or adapted to a specific technical implementation. In this case, the “computational efficiency” of the steps of the methods may be taken into account when assessing inventive step. This is echoed in a minor update to section G-II, 3.6 on “Programs for computers”.

As also discussed below, the EPO is hinting that it might be a good idea to provide some actual experimental evidence to back up claims of increased efficiency when dealing with more abstract software-style inventions.


T 1248/12 – Data Privacy

T 1248/12 considers the field of data privacy.

In this case, the Board distinguished the field of data privacy from the field of data security. It was implied that the field of data security could give rise to technical solutions that provide an inventive step under Article 56 EPC. However, the field of data privacy was felt to relate to administrative, rather than technical, endeavours. In particular, the Board held that de-identifying data, by removing individually identifiable information, and by aggregating data from a plurality of sources, was not technical. It was felt that the claims related to data processing with a legal or administrative aim, rather than a technical one.

It is noted that the specification of the patent application was relatively light on concrete technical details – this may have led the Board to a negative opinion. The generalizations to the field of data privacy are perhaps too heavy-handed; there appears to be room to argue that some data privacy systems do contain technical features. In the light of this case, those drafting applications directed towards a data privacy aim may wish to determine if the technical effects may be recast in neighbouring data security fields.


T 0817/09 – Scoring a Document

T 0817/09 related to a computer implemented method for scoring a document.

The scoring was related to history data and was generated by monitoring signatures of the document, where the signatures were provided in the form of “term vectors”. As per similar linguistic processing cases discussed before, the “term vector” was found not to be “an inherently technical object” and “semantic similarity” was deemed to be a non-technical linguistic concept. The Board considered that solutions developed by the “notional mathematician” or the “notional computer programmer” would generally not be technical, whereas solutions developed by a digital signal processing engineer could be technical.

On the facts, the claimed methods were not found to provide any resource savings that could be presented as a technical, rather than linguistic effect. This does, however, suggest that providing evidence of technical improvements, e.g. reduced server processing times and/or reduced memory usage, may help applications with algorithmic subject matter.


T 0697/17 – Database Management Systems

T 0697/17 considered the patentability of SQL extensions within a relational database.

At first instance, the Examining Division held that the claim in question “entirely described a purely abstract method”. The Board disagreed: they held that the claim related to a method performed in a relational database system, which was a known form of software system within the field of computer science and as such would involve a computer system. The claim was thus not an abstract method. The Board noted that describing a technical feature at a high level of abstraction does not necessarily take away the feature’s technical character.

In consideration of inventive step, the Board cited T 1924/17, and stated that features make a technical contribution if they result from technical considerations on how to (for instance) improve processing speed, reduce the amount of memory required, improve availability or scalability, or reduce network traffic, when compared with the prior art or once added to the other features of the invention, and contribute in combination with technical features to achieve such an effect. However, effects and the respective features are non-technical if the effects are achieved by non-technical modifications to the underlying non-technical method or scheme (for example, a change of the business model, or a “pure algorithmic scheme”, i.e. an algorithmic scheme not based on technical considerations). The Board made an interesting distinction between a “programmer as such” and a “technical programmer”, and stated it was difficult to distinguish abstract algorithmic aspects that were non-technical and arose from the “programmer as such” from “technical programming” aspects that arose from the “technical programmer”.

Returning to T 1924/17, the Board concluded that a database management system is not a computer program as such but rather a technical system. The data structures used for providing access to data and for optimising and processing queries were deemed functional data structures and were held to purposively control the operation of the database management system and of the computer system to perform those technical tasks. While a database system is used to store non-technical information and database design usually involves information-modelling aspects, which do not contribute to solving a technical problem, the implementation of a database management system involves technical considerations. In the end, the case, which had been pending for 13 years, was remitted back to the Examining Division with an informally rap over the knuckles. It provides a useful citation for those drafting and prosecuting applications relating to database management systems.

Revised Guidelines for Computer-Implemented Inventions

Examination practice at the European Patent Office follows a set of Guidelines. These are published online and provide guidance for European Examiners and applicants. They are updated annually.

An updated set of Guidelines came into force on 1st November 2018. The recent updates introduce major amendments to sections that cover subject matter that is excluded from patentability in European. These sections include those directed to “mathematical methods”, “schemes, rules and methods for performing mental acts, playing games or doing business” (often shortened to “business methods”), and “programs for computers”. The updates are relevant to those filing applications related to “computer-implemented inventions” (often colloquially referred to as “software patents”).

Although the amendments do not significantly change current practice at the European Patent Office, they do expand the guidance on what may and may not be protected with a European patent. They represent a significant upgrade and demonstrate the maturity of the case law with regard to computer-implemented inventions.

This post will review and highlight the updates. The post may be useful for those seeking to patent machine learning and artificial intelligence inventions. The updates cover the following areas:

  • claims to distributed computing systems;
  • inventions that use mathematical methods;
  • AI and machine learning inventions;
  • inventions that cover simulations and models; and
  • inventions that relate to business methods, gaming, mental acts or computer programs.

Distributed Computing

Section F-IV, 3.9.3 has been added to the section relating to claims for computer-implemented inventions. It provides expanded guidance and an example relevant to processes operating in a distributed computing environment. These processes form a basis for many real-world implementations of computer-implemented inventions. For example, a smartphone accessing a cloud computing service would implement a process operating in a distributed computing environment.

The section sets out the current practice of the European Patent Office. Claims in a claim set may be directed to each entity of the distributed system and/or the system as a whole. Such a claim set may be argued to meet the requirement for multiple independent claims set by Rule 43(2)(a) EPC, i.e. the claims may be allowed despite having multiple independent claims in the same category because the subject-matter of the claims relates to a plurality of interrelated products. However, each individual claim will need to meet the requirements of novelty, inventive step and clarity.

For example, if a cloud computing service provides a new image classification function via an application programming interface that is accessed by a smartphone, a claim set may feature apparatus claims to both a server computing device (the cloud server) and a mobile computing device (the “accessing device” or smartphone). If the smartphone is simply a generic smartphone making a network request (e.g. an “HTTPS request to a REST API endpoint”), it will likely not be new when compared to known smartphones. An objection will be raised against the smartphone claim. However, if the smartphone implements some new low-level processing, e.g. some new feature extraction process that is specific to the new image classification function (like pre-extracting cat-like facial features), it may also be new and inventive in itself and be allowed.

The updated section draws our attention to the need for clarity in claims to distributed entities. It recommends that distributed method claims specify the entity that is performing each method step.

Claiming distributed processes in a challenge. In practice, one entity often implements most of the new and inventive process (e.g. the cloud server), while other devices are relatively “thin” and generic (e.g. the smartphone). However, claims to the accessing device are often of greater commercial value (e.g. they might allow a royalty for each smartphone that is sold). This often leads to the inclusion of claims to the mobile computing device in a claim set, but a high likelihood of an objection being raised by the European Examiner.

To attempt to overcome novelty and inventive step objections to “accessing device” claims, it is common to include an indirect or implicit reference to the functions of the server computing device. This can then lead to one or more clarity, novelty or inventive step objections. For example, the indirect features may trigger a clarity objection for not clearly specifying features of the “accessing device”. Alternatively, the indirect features may be ignored for novelty and/or inventive step, as they are deemed to present no inherent structural limitations for the “accessing device”.

When drafting claims to distributed systems, it is worth questioning the inventors to determine what functions may be implemented with low-level adaptations to the accessing device. If the invention can be embodied in an “app”, it is worth looking at the architecture of the app, and the sequence of low-level system calls it may be implementing. This may not be obvious to the inventors, as commercial and engineering demands often require as much functionality as possible to be embodied on the back-end in the cloud.


Mathematical Methods

The section on the examination of mathematical methods has been re-written and two sub-sections have been added. A first sub-section – 3.3.1 – now provides specific guidance for artificial intelligence and machine learning. A second sub-section – 3.3.2 – expands upon claims to simulation, design or modelling.

The updated guidance is now clearer on the importance of “technical means”, i.e. a concrete implementation in a field of technology, when an invention makes use of mathematical methods. This complements the recent changes to practice for “abstract inventions” in the United States.

I really like the updates to this section and the inclusion of helpful concrete examples. The section emphasises that a mathematical method or algorithm per se will not be enough to make a claim feature patentable, although many patentable inventions do have a mathematical or algorithmic basis.

Examples of the Fast Fourier Transform, geometric objects and graphs are provided: these features may contribute to the technical character of an invention if they contribute to producing a technical effect that serves a technical purpose. Put another way, these features need to be provided in a context that relates to an engineering problem encountered in the real-world, and the use of these features needs to result in a change in the real-world that helps solve that problem. This is further emphasised later in the guidance – the technical purpose of the mathematical features needs to be specific rather than generic, and the claim needs to be functionally limited to the technical purpose, either explicitly or implicitly.

What kind of applications are seen by the European Patent Office to be “technical”? My personal definition is: does the application relate to something in the real-world that requires knowledge that is taught in an undergraduate engineering degree? If the answer is “yes”, then the application is “technical”. If the answer is “no”, then the application may not be “technical”.

Section 3.3 now provides a useful list of purposes that are deemed “technical”. These include:

  • controlling a specific machine or technical process, e.g. an X-ray apparatus or steel cooling;
  • using measurements to control a machine or technical process, e.g. using a compaction machine to achieve a desired material density;
  • digital audio/visual processing, this can be relatively high-level – detecting people is a provided example (but a clear relation to captured data is recommended);
  • processing speech data, e.g. to generate text (but processing text per se may not be technical);
  • encoding data, e.g. for transmission or storage;
  • cryptography;
  • load balancing;
  • generating higher-level measurements by processing data from physiological sensors or other medical diagnosis;
  • analysing DNA samples to provide a genotype estimate; and
  • simulating “technical” things (this is described in more detail in new sub-section 3.3.2).

The section stresses that there must be a sufficient link between the technical purpose of the invention and the mathematical method steps, for example, by specifying how the input and the output of the sequence of mathematical steps relate to the technical purpose so that the mathematical method is causally linked to a technical effect. When drafting an application for Europe for an invention that features mathematical operations (e.g. equations and/or algorithmic designs), it is recommended to place such an explanation in the description – this can then be pointed to in examination if any objection is raised.

Similar to practice in the United Kingdom, the section ends by indicating that a feature may contribute to the technical character of an invention independently of any technical application, when the claim is directed to a specific technical implementation of a mathematical method, and the mathematical method is particularly adapted for that implementation in that its design is motivated by technical considerations of the internal functioning of the computer. Using the Fast Fourier Transform example, it may be possible to obtain protection for a new digital implementation of the Fast Fourier Transform, if you were performing specific mathematical operations that were adapted to the available computing resources of the implementation, e.g. available memory registers, processing cores, etc.

When considering inventions involving mathematical methods, one useful approach is to make an initial determination:

  • Does the invention relate to a specific engineering application (e.g. what branch of “applied” maths is being considered)?
  • Or does the invention relate to a new technical implementation of a mathematical operation (e.g. in effect a new and beneficial way of performing mathematical operations or “computing” using a device – sometimes called “core” inventions)?

A positive answer in the first case, suggests a quick check against the provided examples and the case law to determine if the specific engineering application has in the past been deemed to be “technical” under European practice.

A positive answer in the second case suggests looking carefully at the constraints imposed by the electronic hardware of the implementation. You will need to describe how the mathematical method is adapted, e.g. as compared to a “text-book” application, to provide concrete implementational improvements.


Artificial Intelligence and Machine Learning

Sub-section 3.3.1 is relatively short and seeks to summarise existing case law that applies in this area. This anticipates a large rise in patent applications over the coming years.

Machine learning inventions have been patented at the European Patent Office almost since its inception in the late 1970s. The present sub-section reminds us that despite the recent resurgence in neural networks, algorithms for approaches such as “classification, clustering, regression and dimensionality reduction” including “genetic algorithms, support vector machines, k-means, kernel regression and discriminant analysis” have been around for a number of years.

The sub-section stresses that the algorithms and approaches themselves per se of an abstract mathematical nature. The guidance from section 3.3 therefore applies: the invention either needs to relate to a specific engineering application that uses the approaches (e.g. using k-means clustering to classify packets in a network for selective filtering) or a new technical implementation of the approach that is constrained by technical factors at least the underlying computation hardware.

The sub-section hints that “technical character” often requires a clear causal link to measured data that represents physical phenomena. For example, classification of digital data such as physiological measurements, images, videos, or audio data is seen to be a common “technical application”. However, classifying text data is regarded as a “linguistic” and “non-technical” application. Likewise, general classification of “data” or “records” without a link to a specific technical problem would likely be seen as “non-technical”. Reference is made to case T 1358/09.

The sub-section ends by indicating that if a classification method is seen to serve a technical purpose then the steps of generating the training set and training the classifier may also contribute to the technical character of the invention if they support the technical purpose. This provides useful advice for drafting claims for inventions in this area: it is recommended to consider independent claims to the generation of training data and architecture training, as well as claims to an inference step. These claims may also provide a distributed processing system as discussed in section 3.3, for example inference may be performed on a smartphone, whereas data cleaning and training may be performed on a remote server. Care should be taken to cover different infringing acts.


Simulation, design or modelling

Sub-section 3.3.2 draws out material that was present in section 3.3. Discussing this material in a separate sub-section clarifies the high-level overview now present in section 3.3.

A computer-implemented simulation of a specific technical system or process may be seen to provide a technical effect and lead to a granted European patent. However, objections will be raised to computer-implemented simulations of non-technical systems or processes, such as those with an aim in the fields of finance, marketing, administration, scheduling or logistics. Care should be taken; cases such as T 531/09 indicate that the presence of technical devices (X-ray scanners in that case) is not enough to provide technical character, the technical devices need to be specific devices and the simulation needs to perform a technical purpose.

In the field of computer-aided design, the determination of a technical parameter which is intrinsically linked to the functioning of a technical object, where the determination is based on technical considerations, is a technical purpose. For example, a method of determining a particular value for a parameter of a specific technical device, in a manner that improves production or use of the device may be seen as suitably “technical”. Care should be taken if the design involves decisions to be made by a human being – e.g. the selection of an approved value – this intervention may be seen to break a causal chain that connects the design method to a technical purpose. Such decisions also risk importing factors that are outside of a narrow determination based on “technical considerations”.

Finally, this new sub-section suggests that claims that produce “models” will often lead to objections on the grounds that the models are not technical features per se; instead, they are seen as “abstract” mathematical or mental features. This again complements current practice in the United States. It is emphasised that generation of a model may be considered to lack a technical effect, even if the modelled product, system or process is technical. It this case it is important that the claim clearly indicates how the model is used, or to be used, in a technical system or process to solve a technical problem.


Business Methods

The previous high-level summary in section 3.5 has now been deleted, with this material being moved into separate sub-sections related to each of “performing mental acts”, “playing games” and “doing business”. Each sub-section then contains new material relating to each sub-category.

Each sub-section begins with a useful definition of each exclusion. Although this is described in the context of the exclusion being applied to the whole claim (e.g. the exclusion applying “as such” or “per se”), this often will not occur in practice, e.g. in most cases the exclusions set out in Article 52(2)(c) EPC will be avoided by having the method performed by a computing device. However, the definitions are useful as they indicate which claim features may be ignored for inventive step on the grounds that they provide no technical effect.


Mental Acts

These are described as instructions to the human mind on how to conduct cognitive, conceptual or intellectual processes. The learning of a language is given as an example, which hints at how the European Patent Office legally support an objection to “linguistic” features (e.g. text processing) as being non-technical.

When drafting claims to computer-implemented inventions, especially method claims, care should be taken to avoid accidentally falling within the exclusion. For example, claims should be checked to ensure that the method steps therein cannot be performed entirely in the human mind; at least one step needs to be performed outside of the human mind. In practice, considering whether a method step can be performed in the human mind is useful when predicting whether inventive step objections may be issued during European examination; if the determination is positive, the method step can often be drafted or amended in a manner that avoids this interpretation, e.g. by referring to a specific technical apparatus. The sub-section indicates that a method would not be seen as performing mental acts if it requires the use of technical means or if it provides a physical entity as a resulting product.

The sub-section does not indicate that mental steps are necessarily ignored for an analysis of inventive step; however, it does emphasise that are mental steps must contribute to producing a technical effect that serves a technical process. A good example provided in the sub-section is that of affixing a driver to a Coriolis mass flowmeter: steps specifying the position of the driver may be performed mentally but by defining the position so as to maximise the performance of the flowmeter, a technical contribution is provided.


Games

Games are defined in sub-section 3.5.2 as a conceptual framework of conventions and conditions that govern player conduct and how a game evolves in response to decisions and actions by the players. Games are governed by game rules, that are by their nature abstract, mental entities that are only meaningful within a game context. Games may be simple – matching random numbers – or complex – video games with extensive virtual game worlds.

If a claim sets out technical means for implementing the rules of a game, it is not excluded as such and analysis moves onto inventive step. To provide an inventive step, a claim feature must make a technical contribution, i.e. provide some engineering benefit beyond a mere computer-implementation of the game rules. The benefit of a claim feature is to be assessed from the point of view of an engineer or game programmer, who may be given the games rules by a game designer as a “requirements specification”.

The sub-section indicates that in many situations the burden is on the applicant to show that a gaming invention provides a real engineering benefit. It notes that abstracting non-technical game elements, relying on a complexity of a solution or indicating cognitive content will not help the applicant.

It is interesting to compare the general negativity of this sub-section with cases such as T 928/03 and T 12/08 that presented a more liberal view of the technical nature of gaming inventions. It will be seen whether they represent a narrower approach than seen in the past.


Doing Business

Doing business is defined in sub-section 3.5.3 as including activities which are of financial, commercial, administrative or organisational nature. The latter two areas should be noted; they are often overlooked as they do not directly relate to making a profit but are still seen to be “non-technical”.

Some useful examples of “business method” features are provided. They include: 

  • banking,
  • billing,
  • accounting,
  • marketing,
  • advertising,
  • licensing,
  • management of rights and contractual agreements,
  • legal activities,
  • personnel management,
  • workflows,
  • scheduling of tasks,
  • logistics,
  • organisational rules,
  • operational research,
  • planning, forecasting,
  • business optimisation, and
  • data science for the purpose of managerial decision making.

If an invention relates to any of these features, it should be assumed to relate to excluded subject matter unless there is strong evidence that a technical problem is being solved by a technical solution that involves technical considerations.

For practitioners, a disclosure document or inventor from industry will often present an invention in terms of a commercial benefit. For example, inventors often become familiar with internally promoting an invention on commercial grounds. Care should be taken to dig behind these grounds and return to the underlying engineering aspects of the idea. If no engineering aspects can be presented, the idea may not be suitable for a European patent application. Examiners and Boards of Appeal will also use an indication of a commercial benefit, or the presence of the above business features, as evidence of a lack of a technical contribution. For this reason, it is recommended to avoid discussing these when drafting the patent specification.


Programs for Computers

Section 3.6 has now been redrafted and sub-sections 3.6.1, 3.6.2, and 3.6.3 have been added to respectively cover “further technical effects”, “information modelling” and programming, and “data retrieval, formats and structures”.

Section 3.6 now begins by indicating that computer programs must produce a “further technical effect” to avoid exclusion on the grounds of being a computer program “as such”. A “further technical effect” is an effect that goes beyond the normal operation of a computer, e.g. the physical effects of executing a computer program. Controlling a technical process or the internal functioning of a computer or its interfaces are deemed to be valid “further technical effects”.

Although not explicitly indicated in section 3.6, it is relatively straightforward to demonstrate a “further technical effect” and avoid an objection to the whole claim under Articles 52(2)(c) and (3) EPC. For example, claims to a computer program may be said to provide a “further technical effect” if they include instructions to implement a technical method, e.g. if they indicate a dependency to an independent method claim that is deemed technical. In this manner, European patent applications often feature claims to a “computer program for implementing the method of claim X”.

Further technical effects that may be demonstrated by a computer program are set out in sub-section 3.6.1. These include:

  • controlling a technical system or process (e.g. a braking system of a car or an X-ray device);
  • data processing in any of the areas highlighted in section 3.3, e.g. audio/visual processing, encryption or compression;
  • improving the internal functioning of a computer running the program, e.g. programs that are adapted for a specific architecture or that provide benefits at the kernel or operating system level; and
  • providing low-level tools such as compilers, memory allocators, and builders.

This updated section and its sub-sections are more useful in indicating what kind of features may be deemed to provide a technical effect. For example, if a feature of a computer program is deemed to provide a “further technical effect” as set out in this section, the feature would be seen as “technical” and be counted in any evaluation of inventive step (e.g. for other independent system or method claims).


Information Modelling and Programming

Sub-section 3.6.2 now provides useful guidance when the invention relates to aspects of computer engineering or software in itself, e.g. as opposed to a computerised solution in another field of engineering. While software developers may assume that their solution is technical according to the normal use of that term, features may not actually be “technical” for the requirements of patentability.

Information modelling is defined here as relating to providing a formal description of a real-world system or process. It may be seen to relate to models built in graphical or textual modelling languages, such as the Unified Modelling Language (UML) or the Business Process Modelling Notation (BPMN). Information Modelling may result in data models or templates that represent an underlying process.

Programming is defined as relating to the way in which computer code is written. It can involve choosing certain options or conventions for performing a common functional operation, or defining and providing a programming language, including text-based or graphical systems.

This sub-section stresses that information modelling or programming features that improve the intellectual effort of a programmer or software developer will often be seen to lack technical character and so cannot contribute to an inventive step. Benefits such as re-usability, platform-independence, conciseness, easier code-management or convenience for documentation, are not regarded as technical effects. For a feature to provide a technical effect, it must provide an improvement from the viewpoint of the computer, as opposed to the programmer. For example, manipulating machine code to provide for greater memory efficiency is seen as providing a technical contribution.


Data retrieval, formats and structures

Computer-implemented data structures or data formats embodied on a medium or as an electromagnetic carrier wave may be claimed, as they do not fall within the exclusions of Article 52(2) EPC. This sub-section has been relocated from previous section 3.7.

This section emphasises that cognitive data, i.e. data that is only relevant to a human user, cannot normally contribute to an inventive step. However, functional data, i.e. data that controls a device processing the data and that reflects technical features of the device, can.

Some examples of functional data are provided. These include a picture encoding, an index structure for a relational database, or a header structure of an electronic message. It is emphasised that the actually data content of the picture, database record or electronic message is often seen to be cognitive content and so cannot contribute to an inventive step.

Building a Claim-Figure-Description Dataset

When working with neural network architectures we need good datasets for training. The problem is good datasets are rare. In this post I sketch out some ideas for building a dataset of smaller, linked portions of a patent specification. This dataset can be useful for training natural language processing models.

What are we doing?

We want to build some neural network models that draft patent specification text automatically.

In the field of natural language processing, neural network architectures have shown limited success in creating captions for images (kicked off by this paper) and text generation for dialogue (see here). The question is: can we get similar architectures to work on real-world data sources, such as the huge database of patent publications?

How do you draft a patent specification?

As a patent attorney, I often draft patent specifications as follows:

  1. Review invention disclosure.
  2. Draft independent patent claims.
  3. Draft dependent patent claims.
  4. Draft patent figures.
  5. Draft patent technical field and background.
  6. Draft patent detailed description.
  7. Draft abstract.

The invention disclosure may be supplied as a short text document, an academic paper, or a proposed standards specification. The main job of a patent attorney is to convert this into a set of patent claims that have broad coverage and are difficult to work around. The coverage may be limited by pre-existing published documents. These may be previous patent applications (e.g. filed by a company or its competitors), cited academic papers or published technical specifications.

Where is the data?

As many have commented, when working with neural networks we often need to frame our problem as map X to Y, where the neural network learns the mapping when presented with many examples. In the patent world, what can we use as our Xs and Ys?

  • If you work in a large company you may have access to internal reports and invention disclosures. However, these are rarely made public.
  • To obtain a patent, you need to publish the patent specification. This means we have multiple databases of millions of documents. This is a good source of training data.
  • Standards submissions and academic papers are also published. The problem is there is no structured dataset that explicitly links documents to patent specifications. The best we can do is a fuzzy match using inventor details and subject matter. However, this would likely be noisy and require cleaning by hand.
  • US provisional applications are occasionally made up of a “rough and ready” pre-filing document. These may be available as priority documents on later-filed patent applications. The problem here is that a human being would need to inspect each candidate case individually.

Claim > Figure > Description

At present, the research models and datasets have small amounts of text data. The COCO image database has one-sentence annotations for a range of pictures. Dialogue systems often use tweet or text-message length text segments (i.e. 140-280 characters). A patent specification in comparison is monstrous (around 20-100 pages). Similarly there may be 3 to 30 patent figures. Claims are better – these tend to be around 150 words (but can be pages).

To experiment with a self-drafting system, it would be nice to have a dataset with examples as follows:

  • Independent claim: one independent claim of one predefined category (e.g. system or method) with a word limit.
  • Figure: one figure that shows mainly the features of the independent claim.
  • Description: a handful of paragraphs (e.g. 1-5) that describe the Figure.

We could then play around with architectures to perform the following mappings:

  • Independent claim > Figure (i.e. task 4 above).
  • Independent claim + Figure > Description (i.e. task 7 above).

One problem is this dataset does not naturally exist.

Another problem is that ideally we would like at least 10,000 examples. If you spent an hour collating each example, and did this for three hours a day, it would take you nearly a decade. (You may or may not also be world class in example collation.)

The long way

Because of the problems above it looks like we will need to automate the building of this dataset ourselves. How can we do this?

If I was to do this manually, I would:

  • Get a list of patent applications in a field I know (e.g. G06).
  • Choose a category – maybe start with apparatus/system.
  • Get the PDF of the patent application.
  • Look at the claims – extract an independent claim of the chosen category. Paste this into a spreadsheet.
  • Look at the Figures. Find the Figure that illustrated most of the claim features. Save this in a directory with a sensible name (e.g. linked to the claim).
  • Look at the detailed description. Copy and paste the passages that mention the Figure (e.g. all those paragraphs that describe the features in Figure X). This is often a continuous range.

The shorter way

There may be a way we can cheat a little. However, this might only work for granted European patents.

One bug-bear enjoyable part of being a European patent attorney is adding reference numerals to the claims to comply with Rule 43(7) EPC. Now where else can you find reference numerals? Why, in the Figures and in the claims. Huzzah! A correlation.

So a rough plan for an algorithm would be as follows:

  1. Get a list of granted EP patents (this could comprise a search output).
  2. Define a claim category (e.g. based a string pattern – [“apparatus”, “system”]).
  3. For each patent in the list:
    1. Fetch the claims using the EPO OPS “Fulltext Retrieval” API.
    2. Process the claims to locate the lowest number independent claim of the defined claim category (my PatentData Python library has some tools to do this).
    3. If a match is found:
      1. Save the claim.
      2. Extract reference numerals from the claim (this could be achieved by looking for text in parenthesis or using a “NUM” part of speech from spaCy).
      3. Fetch the description text using the EPO OPS “Fulltext Retrieval” API.
      4. Extract paragraphs from the description that contain the extracted reference numerals (likely with some threshold – e.g. consecutive paragraphs with greater than 2 or 3 inclusions).
      5. Save the paragraphs and the claim, together with an identifier (e.g. the published patent number).
      6. Determine a candidate Figure number from the extracted paragraphs (e.g. by looking for “FIG* [/d]”).
      7. Fetch that Figure using the EPO OPS “Drawings” or images retrieval API.
        • Now we can’t retrieve specific Figures, only specific sheets of drawings, and only in ~50% of cases will these match.
        • We can either:
          • Retrieve all the Figures and then OCR these looking for a match with the Figure number and/or the reference numbers.
          • Start with a sheet equal to the Figure number, OCR, then if there is no match, iterate up and down the Figures until a match is found.
          • See if we can retrieve a mosaic featuring all the Figures, OCR that and look for the sheet number preceding a Figure or reference numeral match.
      8. Save the Figure as something loadable (TIFF format is standard) with a name equal to the previous identifier.

The output from running this would be triple similar to this: (claim_text, paragraph_list, figure_file_path).

We might want some way to clean any results – or at least view them easily so that a “gold standard” dataset can be built. This would lend itself to a Mechanical Turk exercise.

We could break down the text data further – the claim text into clauses or “features” (e.g. based on semi-colon placement) and the paragraphs into clauses or sentences.

The image data is black and white, so we could resize and resave each TIFF file as a binary matrix of a common size. We could also use any OCR data from the file.

What do we need to do?

We need to code up a script to run the algorithm above. If we are downloading large chunks of text and images we need to be careful of exceeding the EPO’s terms of use limits. We may need to code up some throttling and download monitoring. We might also want to carefully cache our requests, so that we don’t download the same data twice.

Initially we could start with a smaller dataset of say 10 or 100 examples. Get that working. Then scale out to many more.

If the EPO OPS is too slow or our downloads are too large, we could use (i.e. buy access to) a bulk data collection. We might want to design our algorithm so that the processing may be performed independently of how the data is obtained.

Another Option

Another option is that front page images of patent publications are often available. The Figure published with the abstract is often that which the patent examiner or patent drafter thinks best illustrates the invention. We could try to match this with an independent claim. The figure image supplied though is smaller. This maybe a backup option if our main plan fails.

Wrapping Up

So. We now have a plan for building a dataset of claim text, description text and patent drawings. If the text data is broken down into clauses or sentences, this would not be a million miles away from the COCO dataset, but for patents. This would be a great resource for experimenting with self-drafting systems.

 

 

Patent Search as a Deep Learning Problem

This article will look into how the process of obtaining a patent could be automated using deep learning approaches. A possible pipeline for processing a patent application will be discussed. It will be shown how current state of the art natural language processing techniques could be applied.

Brief Overview of Patent Prosecution

First, let’s briefly look at how a patent is obtained. A patent application is filed. The patent application includes a detailed description of the invention, a set of figures, and a set of patent claims. The patent claims define the proposed legal scope of protection. A patent application is searched and examined by a patent office. Relevant documents are located and cited against the patent application. If an applicant can show that their claimed invention is different from each citation, and that any differences are also not obvious over the group of citations, then they can obtain a granted patent. Often, patent claims will be amended by adding extra features to clearly show a difference over the citations.

Patent Data

For a deep learning practitioner the first question is always: what data do I have? If you are lucky enough to have labelled datasets then you can look at applying supervised learning approaches.

It turns out that the large public database of patent publications is such a dataset. All patent applications needs to be published to continue to grant. This will be seen as a serendipitous gift for future generations.

Search Process

In particular, a patent search report can be thought of as the following processes:

img_0179

A patent searched locates a set of citations based on the language of a particular claim.

img_0178

Each located citation is labelled as being in one of three categories:

– X: relevant to the novelty of the patent claim.
– Y: relevant to the inventive step of the patent claim. (This typically means the citation is relevant in combination with another Y citation.)
– A: relevant to the background of the patent claim. (These documents are typically not cited in an examination report.)

In reality, these two processes often occur together. For our ends, we may wish to add a further category: N – not cited.

Problem Definition

Thinking as a data scientist, we have the following data records:

(Claim text, citation detailed description text, search classification)

This data may be retrieved (for free) from public patent databases. This may need some intelligent data wrangling. The first process may be subsumed into the second process by adding the “not cited” category. If we move to a slightly more mathematical notation, we have as data:

(c, d, s)

Where c and d are based on a (long) string of text and s is a label with 4 possible values. We then want to construct a model for:

P(s | c, d)

I.e. a probability model for the search classifications given the claim text and citation detailed description. If we have this we can do many cool things. For example, for a set c, we can iterate over a set of d and select the documents with the highest X and Y probabilities.

Representations for c and d

Machine learning algorithms operate on real-valued tensors (n*m -dimensional arrays). more than that, the framework for many discriminative models maps data in the form of a large tensor X to a set of labels in the form of a tensor Y. For example, each row in X and Y may relate to a different data sample. The question then becomes how do we map (c, d, s) to (X, Y)?

Mapping s to Y is relatively easy. Each row of Y may be an integer value corresponding to one of the four labels (e.g. 0 to 3). In some cases, each row may need to represent the integer label as a “one hot” encoding, e.g. a value of [2] > [0, 0, 1, 0].

Mapping c and d to X is harder. There are two sub-problems: 1) how do we combine c and d? and 2) how do we represent each of c and d as sets of real numbers?

There is an emerging consensus on sub-problem 2). A great explanation may be found in Matthew Honnibal’s post Embed, Encode, Attend, Predict. Briefly summarised, we embed words from the text using a word embedding (e.g. based on Word2Vec or GloVe). This outputs a sequence of real-valued float vectors for each word (e.g. vectors of length ~300). We then encode this sequence of vector into a document matrix, e.g. where each row of the matrix represents a sentence encoding. One common way to do this is to apply a bidirectional recurrent neural network (RNN – such as an LSTM or GRU), where outputs of a forward and backward network are concatenated. An attention mechanism is then applied to reduce the matrix to a vector. The vector then represents the document.

img_0180

A simple way to address sub-problem 1) is to simply concatenate c and d (in a similar manner to the forward and backward passes of the RNN). A more advanced approach might use c as an input to the attention mechanism for the generation of the document vector for d.

Obtain the Data

To get our initial data records – (Claim text, citation detailed description text, search classification) – we have several options. For a list of patent publications, we can obtain details of citation numbers and search classifications using the European Patent Office’s Open Patent Services RESTful API. We can also obtain a claim 1 for each publication. We can then use the citation numbers to look up the detailed descriptions, either using another call to the OPS API or using the USPTO bulk downloads.

I haven’t looked in detail at the USPTO examination datasets but the information may be available there as well. I know that the citations are listed in the XML for a US grant (but without the search classifications). Most International (PCT / WO) publications include the search report, so as a push you could OCR and regex the search report text to extract a (claim number, citation number, search category) tuple.

Training

Once you have a dataset consisting of X and Y from c, d, s, the process then just becomes designing, training and evaluating different deep learning architectures. You can start with a simple feed forward network and work up in complexity.

I cannot guarantee your results will be great or useful, but hey if you don’t try you will never know!

What are you waiting for?

Can you protect Artificial Intelligence inventions at the European Patent Office?

In recent years there has been a resurgence of interest in machine learning and so-called “artificial intelligence” systems. Much of this resurgence is based on advances in so-called “deep learning”, neural networks with multiple layers of connections. For example, convolutional neural networks now provide state-of-the-art performance in many image recognition tasks and recurrent neural networks have been used to increase the accuracy of many commercial machine translation systems. Machine learning may be considered a subdiscipline of “artificial intelligence” that deals with algorithms that are trained to perform tasks such as classification based on collections of data. This recent resurgence has meant that more companies wish to protect innovations in this field. This quickly brings them into the realm of computer-implemented inventions, and the nuances of protection at the European Patent Office.

A881388C-9C65-470D-AED6-9C584435DA4A
Obligatory “Terminator” Patent Attorney Stock Image


Computer-Implemented Inventions

“Computer-implemented invention” is the European Patent Office term for a software invention. Claims that specify machine learning and artificial intelligence systems are almost certainly to be considered “computer-implemented inventions”. The innovation in such systems occurs in the design of the algorithms and/or software architectures. Claims for new hardware to implement machine learning and artificial intelligence systems, such as new graphical processing unit configurations, would not be classed as computer-implemented inventions and would be considered in the same manner as conventional computer devices.


What Do We Have To Go On?

As key advances in the field have only been seen since 2010, there are few Board of Appeal cases that explicitly consider these inventions. It is likely we will see many Board of Appeal decisions in this field, but it is unlikely these will filter through the system much before 2020. However, applications in the field are being filed and examined. The following review is based on knowledge of these applications, evaluated in the context of existing Board of Appeal cases.


Prior Art

A first issue regarding machine learning and artificial intelligence systems is that many of the underlying techniques are public knowledge, given the rapid turn-over of publications and repositories of electronic pre-prints such as arXiv. Hence, many applicants may face novelty and inventive step objections if the invention involves the application of known techniques to new domains or problems. For patent attorneys who are drafting new applications, it is recommended to perform a pre-filing search of such publication sources and ensure that the inventors provide a full appraisal of what is public knowledge.


Domain of Invention

A second issue is the domain of the invention. This may be seen as the context of the invention as presented in the claims and patent description.

Inventions that apply machine learning approaches to fields in engineering are generally considered more positively by the European Patent Office. These fields will typically either operate on low-level data that represents physical properties or have some form of actuation or change in the physical world. For example, the following domains are less likely to have features excluded from an inventive step evaluation for being “non-technical”: navigating a robot within a three-dimensional space; dynamic adaptive change of a Field Programmable Gate Array; audio signal analysis in speech processing; and controlling a power supply to a data centre.

On the other hand, inventions that apply machine learning approaches within a business or “enterprise” domain are likely to be analysed more closely. These inventions have a greater chance of claim features being excluded for being “non-technical”. These domains typically have an aim of increasing profit. The more this aim is explicit in the patent application, the more likely a “non-technical” objection will be raised. For example, the following inventions are more likely to have features excluded from an inventive step evaluation for being “non-technical”: intelligent organisation of playlists in a music streaming service; adaptive electronic trading of securities; automated provision of electronic information in a company hierarchy; and automated negotiation of online advertising auctions.


Exclusions from Patentability

A third issue that arises is that individual features of the claims fall within the exclusions of Article 52(2) EPC.  In the field of machine learning and artificial intelligence systems, there is an increased risk of claim features being considered to fall into one of the following categories: mathematical methods; schemes, rules and methods for performing mental acts or doing business; and presentations of information. These will briefly be considered in turn below.

Mathematical Methods

The field of machine learning is closely linked to the field of statistics. Indeed many machine learning algorithms are an application of statistical methods. Academic researchers in the field are trained to describe their contributions mathematically, and this is required for publication in an academic journal. However, the practice of the European Patent Office, as directed by the Boards of Appeal, typically regards statistical methods as mathematical methods. In their pure, unapplied form they are considered “non-technical”.

Schemes, Rules and Methods for Performing Mental Acts

A claim feature is likely to be considered part of schemes, rules and methods for performing mental acts when the scope of the feature is too broad or abstract. For example, if a claimed method step also covers a human being performing the step manually, it is likely that the scope is too broad.

Schemes, Rules and Methods for Doing Business

Claim features are likely to be considered schemes, rules and methods for doing business when the information processing relates to a business aim or goal. This is especially the case where the information processing is dependent on the content of the data being processed, and that content does not relate to a low-level recording or capture of a physical phenomenon.

For example, processing of a digital sound recording to clean the recording of noise would be considered “technical”; processing row entries in a database of information technology assets to remove duplicates for licensing purposes would likely be considered “non-technical”.

Presentation of Information

Objections that features relate to the presentation of information may occur when the innovation relates to user experience (UX) or user interface (UI) features.

For example, a machine learning algorithm that adaptively arranges icons on a smartphone according to use may receive objections on the grounds that features relate to mathematical methods (the algorithm) and presentation of information (the arrangement of icons on the graphical user interface). As per Guideline G-II, 3.7.1, grant is unlikely if information is simply displayed to a user and any improvement occurs in the mind of the user. However, it is possible to argue for a technical effect if the output provides information on an internal state of operation of a device (at the operating system level or below, e.g. battery level, processing unit utility etc.) or if the output improves a sequence of interactions with a user (e.g. provides a new way of operating a device). Again, a technical problem needs to be demonstrated and the machine learning algorithm needs to be a tool to solve this problem.


Subfields of ML and AI

In certain subfields of machine learning and artificial intelligence, there is a tendency for Boards of Appeal and Examining Divisions to consider inventions more or less “technical”. This is often for a combination of factors, including field of operation of appellants, the history of research and traditional applications, and the background and public policy preferences of staff of the European Patent Office.

For example, machine learning and artificial intelligence systems in the field of image, video and audio processing are more likely to be found to have “technical” features that can contribute to an inventive step under Article 56 EPC. A convolutional neural network architecture applied to image processing is more likely to be considered a “technical” contribution that the same architecture applied to text processing. Similarly, it may be argued that machine learning and artificial intelligence systems in the field of medicine and biochemistry have “technical” characteristics, e.g. if they operate on data originating from mass spectrometry or medical imaging.

However, advances in search, classification and natural language processing are more likely to be found to have “non-technical” features that cannot contribute to an inventive step under Article 56 EPC. These areas of machine learning and artificial intelligence systems are often felt to be “technical” by the engineers and developers building such systems. However, it is a nuance of European case law that these areas are often deemed to have claim features that fall into an excluded “business”, “mathematical” or “administrative” category.

A recent example may be found in case T 1358/09. The claim in this case comprised “text documents, which are digitally represented in a computer, by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document”. The Board agreed with the appellant that the steps in the claim were different to those applied by a human being performing classification. However, the Board concluded that the algorithm underlying the method the claim did not “go beyond a particular mathematical formulation of the task of classifying documents”. They were of the opinion that the skilled person would have been given the (“non-technical”) text classification algorithm and simply be tasked with implementing it on a computer.


What Should We Not Do?

Managers and executives of commercial enterprises are often habituated into selling innovations to a non-technical audience. This means that invention disclosures often describe the invention at an abstract “marketing” level. When an invention is described in a patent application at this level, inventive step objections are likely.

The fact that mathematical formulae may comprise excluded “non-technical” features is difficult for inventors and practitioners to grasp. Often equations at an academic-publication level are included in patent specifications in an attempt to add technical character. This often backfires. While such equations may be deemed “technical” according to a standard definition of the term, they are often not deemed “technical” according to the definition applied by European case law.

In general, objections are more likely in this area when the scope of the claim is broad and attempts to cover applications of a particular algorithm in all industries. Applicants should be advised that trying to cover everything will likely lead to refusal.


What Should We Do?

Chances of grant may be increased by ensuring an examiner or Board of Appeal member can clearly see the practical application of the algorithm to a specific field or low-level technical area.

Patent attorneys drafting patent applications for machine learning and artificial intelligence systems should carefully consider the framing and description of the invention in the patent specification. In-depth discussions with the engineers and developers that are implementing the systems often enable innovations to be described more precisely. Given this precision, innovations may be framed as a “technical” or engineering innovation, i.e. a technical solution to a technical problem. This increases the chance of a positive opinion from the European Patent Office.

Often features of an invention will have both a business advantage and a “technical” advantage. For example, a machine learning system that learns how to dynamically route data over a network may help an online merchant more successfully route traffic to their website; however, this improved method may involve manipulation of data packets within a router that also improves network security. A patent specification describing the latter advantage will have a greater chance of grant than the former, regardless of the actual provenance of the invention. A practitioner may work with an inventor to ensure that initial business advantages are distilled to their proximate “technical” advantages and effects. For cases where data does not relate to a low-level recording or capture of a physical phenomenon, it is recommended to ensure that any described technical effect applies regardless of the content of the data.

When considering exclusion for “mental acts”, a risk of a “non-technical” objection may be reduced by ensuring that your method steps exclude a manual implementation. Note that this exclusion does not necessarily prevent other objections being raised (see T 1358/09 above).

When drafting patent applications,  it is also important to describe the implementation of any mathematical method. In this manner, pseudo-code is often more useful than equations. It is also important to clearly define how attributes of the physical world are represented within the computer. Good questions to ask include: “What data structures and function routines are used to implement the elements of any equation?”, “How is data initially recorded, e.g. are documents a scanned image such as a bitmap or a markup file using a Unicode encoding?”,  “What programming languages and libraries are being used?”, or “What application programming interfaces are important?”.

Practitioners do need to be concerned with including overly limiting definitions within the claims; however, a positive opinion is more likely when specific implementation examples are described in the patent specification, followed by possible generalisations, than when specific implementation examples are omitted and the description only presents a generalised form of the invention along with more detailed mathematical equations.

To be successful in search, classification and natural language processing,  one approach is to determine whether features relating to a non-obvious technical implementation may be claimed. This approach often goes hand in hand with a knowledge of academic publications in the field. While such publications may disclose a version of an algorithm being used, they often gloss over the technical implementation (unless the underlying source code is released on GitHub). For example, is there any feature of the data, ignoring its content, which makes implementation of a given equation problematic? If inventors have managed to reduce the dimensionality of a neural network using clever string pre-processing or quantisation then there may be an argument that the resultant solution is implementable on mobile and embedded devices. Reducing a size of a model from 3 GB to 300 KB by intelligent selection of pipeline stages may enable you to argue for a technical effect.


Do Not Believe The Hype?

Despite the hype, machine learning and artificial intelligence systems are just another form of software solution. As such, all the general guidance and case law on computer-implemented inventions continues to apply. A benefit of the longer timescales of patent prosecution is that you ride out the waves of Gartner’s hype cycle. In fact, I still sometimes prosecute cases from the end of the dotcom boom…