I’ve finally found out how to access UK legislation in XML format – http://www.legislation.gov.uk/developer/uris – you just add /data.xml to the end of the statute URI!
This article will look into how the process of obtaining a patent could be automated using deep learning approaches. A possible pipeline for processing a patent application will be discussed. It will be shown how current state of the art natural language processing techniques could be applied.
Brief Overview of Patent Prosecution
First, let’s briefly look at how a patent is obtained. A patent application is filed. The patent application includes a detailed description of the invention, a set of figures, and a set of patent claims. The patent claims define the proposed legal scope of protection. A patent application is searched and examined by a patent office. Relevant documents are located and cited against the patent application. If an applicant can show that their claimed invention is different from each citation, and that any differences are also not obvious over the group of citations, then they can obtain a granted patent. Often, patent claims will be amended by adding extra features to clearly show a difference over the citations.
For a deep learning practitioner the first question is always: what data do I have? If you are lucky enough to have labelled datasets then you can look at applying supervised learning approaches.
It turns out that the large public database of patent publications is such a dataset. All patent applications needs to be published to continue to grant. This will be seen as a serendipitous gift for future generations.
In particular, a patent search report can be thought of as the following processes:
A patent searched locates a set of citations based on the language of a particular claim.
Each located citation is labelled as being in one of three categories:
– X: relevant to the novelty of the patent claim.
– Y: relevant to the inventive step of the patent claim. (This typically means the citation is relevant in combination with another Y citation.)
– A: relevant to the background of the patent claim. (These documents are typically not cited in an examination report.)
In reality, these two processes often occur together. For our ends, we may wish to add a further category: N – not cited.
Thinking as a data scientist, we have the following data records:
(Claim text, citation detailed description text, search classification)
This data may be retrieved (for free) from public patent databases. This may need some intelligent data wrangling. The first process may be subsumed into the second process by adding the “not cited” category. If we move to a slightly more mathematical notation, we have as data:
(c, d, s)
Where c and d are based on a (long) string of text and s is a label with 4 possible values. We then want to construct a model for:
P(s | c, d)
I.e. a probability model for the search classifications given the claim text and citation detailed description. If we have this we can do many cool things. For example, for a set c, we can iterate over a set of d and select the documents with the highest X and Y probabilities.
Representations for c and d
Machine learning algorithms operate on real-valued tensors (n*m -dimensional arrays). more than that, the framework for many discriminative models maps data in the form of a large tensor X to a set of labels in the form of a tensor Y. For example, each row in X and Y may relate to a different data sample. The question then becomes how do we map (c, d, s) to (X, Y)?
Mapping s to Y is relatively easy. Each row of Y may be an integer value corresponding to one of the four labels (e.g. 0 to 3). In some cases, each row may need to represent the integer label as a “one hot” encoding, e.g. a value of  > [0, 0, 1, 0].
Mapping c and d to X is harder. There are two sub-problems: 1) how do we combine c and d? and 2) how do we represent each of c and d as sets of real numbers?
There is an emerging consensus on sub-problem 2). A great explanation may be found in Matthew Honnibal’s post Embed, Encode, Attend, Predict. Briefly summarised, we embed words from the text using a word embedding (e.g. based on Word2Vec or GloVe). This outputs a sequence of real-valued float vectors for each word (e.g. vectors of length ~300). We then encode this sequence of vector into a document matrix, e.g. where each row of the matrix represents a sentence encoding. One common way to do this is to apply a bidirectional recurrent neural network (RNN – such as an LSTM or GRU), where outputs of a forward and backward network are concatenated. An attention mechanism is then applied to reduce the matrix to a vector. The vector then represents the document.
A simple way to address sub-problem 1) is to simply concatenate c and d (in a similar manner to the forward and backward passes of the RNN). A more advanced approach might use c as an input to the attention mechanism for the generation of the document vector for d.
Obtain the Data
To get our initial data records – (Claim text, citation detailed description text, search classification) – we have several options. For a list of patent publications, we can obtain details of citation numbers and search classifications using the European Patent Office’s Open Patent Services RESTful API. We can also obtain a claim 1 for each publication. We can then use the citation numbers to look up the detailed descriptions, either using another call to the OPS API or using the USPTO bulk downloads.
I haven’t looked in detail at the USPTO examination datasets but the information may be available there as well. I know that the citations are listed in the XML for a US grant (but without the search classifications). Most International (PCT / WO) publications include the search report, so as a push you could OCR and regex the search report text to extract a (claim number, citation number, search category) tuple.
Once you have a dataset consisting of X and Y from c, d, s, the process then just becomes designing, training and evaluating different deep learning architectures. You can start with a simple feed forward network and work up in complexity.
I cannot guarantee your results will be great or useful, but hey if you don’t try you will never know!
What are you waiting for?
Natural Language Processing and Deep Learning have the potential to overhaul patent operations for large patent departments. Jobs that used to cost hundreds of dollars / pounds per hour may cost cents / pence. This post looks at where I would be investing research funds.
The Path to Automation
In law, the path to automation is typically as follows:
Qualified Legal Professional > Associate > Paralegal > Outsourcing > Automation
Work is standardised and commoditised as we move down the chain. Today we will be looking at the last stage in the chain: automation.
At a high level, here are some potential applications of deep learning models that have been trained on a large body of patent publications:
- Invention Disclosure > Patent Specification +/ Claims (Drafting)
- Patent Claims + Citation > Amended Claims (Amendment)
- Patent Claims > Corpus > Citations (Patent Search)
- Invention Disclosure > Citations (Patent Search)
- Patent Specification + Claims > Cleaned Patent Specification + Claims (Proof Reading)
- Figures > Patent Description (Drafting)
- Claims > Figures +/ Patent Description (Drafting)
- Product Description (e.g. Manual / Website) > Citation (Infringement)
- Group of Patent Documents > Summary Clusters (Text or Image) (Landscaping)
- Official Communication > Response Letter Text (Prosecution)
I know there is a lot of hype out there and I don’t particularly want to be responsible for pouring oil on the flames of ignorance. I have tried to base these thoughts on widely reviewed research papers. The aim is to provide more a piece of informed science fiction and to act as a guide as to what may be. (I did originally call it “Your Patent Department 2020” :).
Many of these things discussed below are still a long way off, and will require a lot of hard work. However, the same was said 10 years ago of many amazing technologies we now have in production (such as facial tagging, machine translation, virtual assistants, etc.).
Let’s dive into some examples.
At the moment, patent drafting typically starts as follows: receive invention disclosure, commission search (in-house or external), receive search results, review by attorney, commission patent draft. This can take weeks.
Instead, imagine a world where your inventors submit an invention disclosure and within minutes or hours you receive a report that tells you the most relevant existing patent publication, highlights potentially novel and inventive features and tells you whether you should proceed with drafting or not.
The techniques already exist to do this. You can download all US patent publications onto a hard disk that costs $75. You can convert high-dimensionality documents into lower-dimensionality real vectors (see https://radimrehurek.com/gensim/wiki.html or https://explosion.ai/blog/deep-learning-formula-nlp). You can then compute distance metrics between your decomposed invention disclosure and the corpus of US patent publications. Results can be ranked. You can use a Long Short Term Memory (LSTM) decoder (see https://www.tensorflow.org/tutorials/seq2seq) on any difference vector to indicate novel and possibly inventive features. A neural network classifier trained on previous drafting decisions can provide a probability of proceeding based on the difference results.
A draft patent application in a complicated field such as computing or electronics may take a qualified patent attorney 20 hours to complete (including iterations with inventors). This process can take 4-6 weeks.
Now imagine a world where you can generate draft independent claims from your invention disclosure and cited prior art at the click of a button. This is not pie-in-the-sky science fiction. State of the art systems that combine natural language processing, reinforcement learning and deep learning can already generate fairly fluid document summaries (see https://metamind.io/research/your-tldr-by-an-ai-a-deep-reinforced-model-for-abstractive-summarization). Seeding a summary based on located prior art, and the difference vector discussed above, would generate a short set of text with similar language to that art. Even if the process wasn’t able to generate a perfect claim off the bat, it could provide a rough first draft to an attorney who could quickly iterate a much improved version. The system could learn from this iteration (https://deepmind.com/blog/learning-through-human-feedback/) allowing it to improve over time.
Or another option: how about your patent figures are generated automatically based on your patent claims and then your detailed description is generated automatically based on your figures and the invention disclosure? Prototype systems already exist that perform both tasks (see https://arxiv.org/pdf/1605.05396.pdf and http://cs.stanford.edu/people/karpathy/deepimagesent/).
In the old days, patent prosecution involved receiving a letter from the patent office and a bundle of printed citations. These would be processed, stamped, filed, carried around on an internal mail wagon and placed on a desk. More letters would be written culminating in, say, a written response and a set of amendments.
From this, imagine that your patent office post is received electronically, then automatically filed and docketed. Citations are also automatically retrieved and filed. Objection categories are extracted automatically from the text of the office action and the office action is categorised with a percentage indicating the chance of obtaining a granted patent. Additionally, the text of the citations is read and a score is generated indicating whether the citations remove novelty from your current claims (this is similar to the search process described above, only this time you know what documents you are comparing). If the score is lower than a given threshold, a set of amendment options are presented, along with a percentage chances of success. You select an option, maybe iterate the amendment, and then the system generates your response letter. This includes inserting details of the office action you are replying to (specifically addressing each objection that is raised), automatically generating passages indicating basis in the text of your application, explains the novel features, generates a problem-solution that has a basis in the text of your application, and provides pointers for why the novel features are not obvious. Again you iterate then file online.
Parts of this are already in place at major law firms (e.g. electronically filing and docketing). I have played with systems that can extract the text from an office action PDF and automatically retrieve and file documents via our document management application programming interface. With a set of labelled training data, it is easy to build an objection classification system that takes as input a simple bag of words. Companies such as Lex Machina (see https://lexmachina.com/) already crunch legal data to provide chances of litigation success; parsing legal data from say the USPTO and EPO would enable you to build a classification system that maps the full text of your application, and bibliographic data, to a chance of prosecution success based on historic trends (e.g. in your field since the 1970s). Vector-space representations of documents allow distance measures in n-dimensional space to be calculated, and decoder systems can translate these into the language of your specification. The lecture here explains how to create a question answering system using natural language processing and deep learning (http://media.podcasts.ox.ac.uk/comlab/deep_learning_NLP/2017-01_deep_NLP_11_question_answering.mp4). You could adapt this to generate technical problems based on document text, where the answer is bound to the vector-space distance metric. Indeed, patent claim space is relatively restricted (it is, at heart, a long sentence, where amendments are often additional sub-phrases of the sentence that are consistent with the language of the claimset); the nature of patent prosecution and added subject matter, naturally produces a closed-form style problem.
Imagining Reality is the First Stage to Getting There
There is no doubt that some of these scenarios will be devilishly hard to implement. It took nearly two decades to go from paper to properly online filing systems. However, prototypes of some of these solutions could be hacked up in a few months using existing technology. The low hanging fruit alone offers the potential to shave hundreds of thousands of dollars from patent prosecution budgets.
I also hope that others are aiming to get there too. If you are please get in touch!
Playing around with natural language processing has given me the confidence to attempt some claim language modelling. This may be used as a claim drafting tool or to process patent publication data. Here is a short post describing the work in progress.
- RDF (Resource Description Framework)
- Entity-attribute-value models
- Semantic Networks
- Part of Speech tagging using the Natural Language Processing Toolkit – http://www.nltk.org/book/ch05.html and http://textminingonline.com/dive-into-nltk-part-iii-part-of-speech-tagging-and-pos-tagger
- WIPO Patent Drafting Manual
Here, a caveat: this modelling will be imperfect. There will be claims that cannot be modelled. However, our aim is not a “perfect” model but a model whose utility outweighs its failings. For example, a model may be used to present suggestions to a human being. If useful output is provided 70% of the time, then this may prove beneficial to the user.
To start we will keep it simple. We will look at system or apparatus claims. As an example we can take Square’s payment dongle:
1. A decoding system, comprising:
Let’s say a claim consists of “entities”. These are roughly the subjects of claim clauses, i.e. the things in our claim. They may appear as noun phrases, where the head word of the phrase is modelled as the core “entity”. They may be thought of as “objects” from an object-oriented perspective, or “nodes” in a graph-based approach.
- “a decoding system”
- “a decoding engine”
- “a transaction engine”
An entity may have “properties” (i.e. “is” something) or may have other entities (i.e. “have” something).
In our example, the “decoding system” has the “decoding engine” and the “transaction engine” as child entities. Or put another way, the “decoding engine” and the “transaction engine” have the “decoding system” as a parent entity.
- “running on a mobile device”
- “in operation decoding signals produced from a read of a buyer’s financial transaction card”
- “in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state”
- “detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state”
- “identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits”
- “mobile device”
- “buyer’s financial transaction card”
[When modelling the part of speech tagger is mostly there but probably required human tweaking and confirmation.]
Mapping to Natural Language Processing
To extract noun phrases, we need the following processing pipeline:
claim_text > [1. Word Tokenisation] > list_of_words > [2. Part of Speech Tagging] > labelled_words > [3. Chunking] > tree_of_noun_phrases
Now, the NLTK toolkit provides default functions for 1) and 2). For 3) we have the options of a RegExParser, for which we need to supply noun phrase patterns, or Classifier-based chunkers. Both need a little extra work but there are tutorials on the Net.
Noun phrases should be used consistently throughout claim sentences – this can be used to resolve ambiguity.
This post sets out a number of resources to get you started with deep learning, with a focus on natural language processing for legal applications.
A Bit of Background
Deep learning is a bit of a buzz word. Basically, it relates to recent advances in neural networks. In particular, it relates to the number of layers that can be used in these networks. Each layer can be thought of as a mathematical operation. In many cases, it involves a multidimensional extension of drawing a line, y = ax + b, to separate a space into multiple parts.
I find it strange that when I studied machine learning in 2003/4, neural networks had gone out of fashion. The craze then was for support vector machines. Neural networks were seen as a bit of a dead end. While there was nothing wrong theoretically, in practice it wasn’t possible to train a network with more than a couple of layers. This limited their application.
Computers and software improved. Memory increased. Researchers realised they could co-opt the graphical processing units of beefy graphics cards of hardcore gamers to perform matrix and vector multiplication. The Internet improved access to large scale data sets and enabled the fast propagation of results. Software tool kits and standard libraries arrived. You could now program in Python for free rather than pay large licence fees for Matlab. Python made it easy to combine functionality from many different areas. Software became good at differentiating and incorporating advanced mathematic optimisation techniques. Google and Facebook poured money into the field. Etc.
This all led to researchers being able to build neural networks with more and more layers that could be trained efficiently. Hence, “deep” means more than two layers and “learning” refers to neural network approaches.
Deep Natural Language Processing
Deep learning has a number of different application areas. One big split is between image processing and natural language processing. The former has seen big success with the use of convolutional neural networks (CNNs), while natural language processing has tended to focus on recurrent neural networks (RNNs), which operate on sequences within time.
Image processing has also typically considered supervised learning problems. These are problems where you have a corpus of labelled data (e.g. ‘ImageX’ – ‘cat’) and you want a neural network to learn the classifications.
Natural language processing on the other hand tends to work with unsupervised learning problems. In this case, we have a large body of unlabelled data (see the data sources below) and we want to build models that provide some understanding of the data, e.g. that model in some way syntactic or semantic properties of text.
Saying this there are cross overs – there are several highly-cited papers that apply CNNs to sentence structures, and document classification can be performed on the basis of a corpus of labelled documents.
Introductory Blog Posts
After you’ve read those blog articles a next step is to dive into the Udacity free Deep Learning course. This is taught in collaboration with Google Brain and is a great introduction to Logical Regression, Neural Networks, Data Wrangling, CNNs and a form of RNNs called Long Short Term Memory (LSTMs). It includes a number of interactive Jupyter/IPython Notebooks, which follow a similar path to the Tensorflow tutorials.
Udacity Deep Learning Course – https://www.udacity.com/course/deep-learning–ud730
Their Data Science, Github, Programming and Web Development courses are also very good if you need to get quickly up to speed.
Once you’ve completed that, a next step is working through the lecture notes and exercises for these Stanford and Oxford courses.
Stanford Deep Learning for Natural Language Processing – http://cs224d.stanford.edu/syllabus.html
Oxford Deep NLP (with special guests from Deepmind & Nvidia) – https://github.com/oxford-cs-deepnlp-2017/lectures
Once you’ve got your head around the theory, and have played around with some simple examples, the next step is to get building on some legal data. Here’s a selection of useful text sources with a patent slant:
USPTO bulk data – https://bulkdata.uspto.gov/ – download all the patents!
Some of this data will require cleaning / sorting / wrangling to access the text. There is an (experimental) USPTO project in Java to help with this. This can be found here: https://github.com/USPTO/PatentPublicData . I have also been working on some Python wrappers to access the XML in (zipped) situ – https://github.com/benhoyle/patentdata and https://github.com/benhoyle/patentmodels.
Wikipedia bulk data – https://dumps.wikimedia.org/enwiki/latest/ – download all the knowledge!
The file you probably want here is enwiki-latest-pages-articles.xml.bz2. This clocks in at 13 GB compressed and ~58 GB uncompressed. It is supplied as a single XML file. Again I need to work on some Python helper functions to access the XML and return text.
(Note: this is the same format as recent USPTO grant data – a good XML parser that doesn’t read the whole file into memory would be useful.)
The easiest way to access this data is probably via the NLTK toolkit indicated below. However, you can download the data for WordNet 3 here – https://wordnet.princeton.edu/wordnet/download/current-version/.
Bailli – http://www.bailii.org/ – a free online database of British and Irish case law & legislation, European Union case law, Law Commission reports, and other law-related British and Irish material.
There is no bulk download option for this data – it is accessed as a series of HTML pages. It would not be too difficult to build a Python tool to bulk download various datasets.
UK Legislation – Legislation.gov.uk.
This data is available via a web interface. Unfortunately, there does not appear to be a bulk download option or an API for supplying machine readable data.
On the to-do list is a Python wrapper for supplying structured or unstructured versions of UK legislation from this site (e.g. possibly downloading with requests then parsing the returned HTML).
European Patent Office Board of Appeal Case Law database – https://www.epo.org/law-practice/case-law-appeals/advanced-search.html.
Although there is no API or bulk download option as of yet, it is possible to set up an RSS feed link based on search parameters. This RSS feed link can be processed to access links to each decision page. These pages can then be accessed and converted into text using a few Python functions (I have some scripts to do this I will share soon).
UK Intellectual Patent Office Hearing Database – https://www.ipo.gov.uk/p-challenge-decision-results.htm.
Again a human accessible resource. However, the decisions are accessible by year in fairly easy to parse tables of data (I again have some scripts to do this that I will share with you soon).
Your Document / Case Management System.
Many law firms use some kind of document and/or case management system. If available online, there may be an API to access documents and data stored in these systems. Tools like Textract (see below) can be used to extract text from these documents. If available as some form of SQL database, you can often access the data using ODBC drivers.
Once you have some data the hard work begins. Ideally what you want is a nice text string per document or article. However, none of the data sources listed above enable you to access this easily. Hence, you need to start building some wrappers in Python to access and parse the data and return an output that can be easily processed by machine learning libraries. Here are some tools for doing this, and then to build your deep learning networks. For more details just Google the name.
– brilliant for many natural language processing functions such as stemming, tokenisation, part of speech tagging and many more.
– an advanced set of NLP functions.
– another brilliant library for processing big document libraries – particularly good for lazy functions that do not store all the data in memory.
– for building your neural networks.
– a wrapper for Tensorflow or Theano that allows rapid prototyping.
– provides implementations for most of the major machine learning techniques, such as Bayesian inference, clustering, regression and more.
– great for easy parsing of semi-structured data such as websites (HTML) or patent documents (XML).
– a very simple wrapper over a number of different Linux libraries to extract text from a large variety of files.
– think of this as a command line Excel, great for manipulating large lists of data.
– numerical analysis in Python, used, amongst other things, for multidimensional arrays.
– great for prototyping and research, the engineers squared paper notebook of the 21st century, plus they can be easily shared on GitHub.
– many modern toolkits require a bundle of libraries, it can be easier to setup a Docker image (a form of virtualised container).
– for building web servers and APIs.
Finding a good patent attorney (or patent client) is a lot like dating.
Once upon a time, dates were centred around [the golf course / an elite educational establishment alumni group / the locker room / a City gentleman’s club]* (delete as appropriate).
Dates were also primarily a male affair. Typically among greying men in suits and ties.
However, we now live in the 21st century. We have at our disposal the data to make much better matches.
There are several free public lists you can use to find companies. These include:
- Applicant lists from the World Intellectual Property Office (WIPO):
- Applicant lists from the European Patent Office (EPO):
- Statistics on the top 50 applicants are provided in the annual reports back to 2004 – https://www.epo.org/about-us/annual-reports-statistics/annual-report.html.
- For at least the recent reports there is a downloadable Excel (XLS) spreadsheet – look for the XLS icon somewhere on the page (normally at the top or bottom).
- From IPO.org for US applicants:
- This site provides a list of the Top 300 Organizations Granted U.S. Patents in 2015 (in PDF format)
- From the London Stock Exchange:
- A list of all companies listed on the London Stock Exchange is provided in an Excel (XLS) file – http://www.londonstockexchange.com/statistics/companies-and-issuers/companies-and-issuers.htm
- From Fast Track:
From these lists you can collate a large list of companies that may or may not require intellectual property services. I prefer a long CSV list with no fancy formatting.
Matching by Technology
Most companies specialise in particular areas of technology. Likewise, most patent attorneys have specific experience in certain technologies. A good technology match saves time and money.
One way to match by technology is to use the International Patent Classification.
If you have lots of time (or a work experience student or a Mechanical Turk) you can take each company from your list, one-by-one, and perform a search on EspaceNet. You can then look through the results and make a note of the classifications of the patent applications returned from the search.
If you have no time, but a geeky interest in Python, you can automate this using the excellent EPO Online Patent Services.
Through a few hacky functions (which can be found on GitHub), you can:
- Iterate through a large list of companies / applicants;
- Clean the company / applicant name to ensure relevant search results;
- Process the search results to extract the classifications;
- Process the search results to determine the patent agent of record;
- Process the classifications to build up a technology profile for each company / applicant; and
- Process the classifications to rank companies / applicants within a particular technology area.
For example, say you are a patent attorney with 20 years worth of experience in organic macromolecular compounds or centrifugal apparatus. Who would you look at helping? How about:
Or say you wanted to know what technology areas Company X worked in? How about:
(* Quiz: any idea who this may be? Guesses in the comments…)
Or say you work for Company X and you wonder which patent attorneys work for your competitors or in a particular technology area. How about:
By improving matching, e.g. between companies and patent attorneys, we can open up legal services. As the potential of technology grows, legal service provision need not be limited to a small pool of ad-hoc connections. Companies can get a better price by looking outside of expensive traditional patent centres. Work product can be improved as those with the experience and passion for a particular area of technology can be matched with companies that feel the same.
In a previous post, we looked at some measures of patent attorney (or firm) success:
- Low cost;
- Minimal mistakes;
- Timely actions; and
- High legal success rate.
In this post, we will look at how we can measure these.
Let’s start with legal success. For legal success rate we identified the following:
- Case grants (with the caveat that the claims need to be of a good breadth);
- Cases upheld on opposition (if defending);
- Cases revoked on opposition (if opposing);
- Oral hearings won; and
- Court cases won.
When looking to measure these we come across the following problems:
- It may be easy to obtain the grant of a severely limited patent claim (e.g. a long claim with many limiting features) but difficult to obtain the grant of a more valuable broader claim (e.g. a short claim with few limiting features).
- Different technical fields may have different grant rates, e.g. a well-defined niche mechanical field may have higher grant rates than digital data processing fields (some “business method” areas have grant rates < 5 %).
- Cases are often transferred between firms or in-house counsel. More difficult cases are normally assigned to outside counsel. A drafting attorney may not necessarily be a prosecuting attorney.
- During opposition or an oral hearing, a claim set may be amended before the patent is maintained (e.g. based on newly cited art). Is this a “win”? Or a “loss”? If an opponent avoids infringement by forcing a limitation to a dependent claim, that may be a win. What if there are multiple opponents?
- In court, certain claims may be held invalid, certain claims held infringed. How do you reconcile this with “wins” and “losses”?
One way to address some of the above problems is to use a heuristic that assigns a score based on a set of outcomes or outcome ranges. For example, we can categorise an outcome and assign each category of outcome a “success” score. To start this we can brainstorm possible outcomes of each legal event.
To deal with the problem of determining claim scope, we can start with crude proxies such as claim length. If claim length is measured as string length, (1 / claim_length) may be used as a scoring factor. As automated claim analysis develops this may be replaced or supplemented by claim feature or limiting phrase count.
Both these approaches could also be used together, e.g. outcomes may be categorised, assigned a score, then weighted by a measure of claim scope.
For example, in prosecution, we could have the following outcomes:
- Application granted;
- Application abandoned; and
- Application refused.
Application refused is assigned the lowest or a negative score (e.g. -5). Abandoning an application is often a way to limit costs on cases that would be refused. However, applications may also be abandoned for strategic reasons. This category may be assigned the next lowest or a neutral score (e.g. 0). Getting an application granted is a “success” and so needs a positive score. It maybe weighted by claim breadth (e.g. constant / claim_length for shortest independent claim).
In opposition or contentious proceeding we need to know whether the attorney is working for, or against, the patent owner. One option maybe to set the sign of the score based on this information (e.g. a positive score for the patentee is a negative score for the opponent / challenger). Possible outcomes for opposition are:
- Patent maintained (generally positive for patentee, and negative for opponent);
- Patent refused (negative for patentee, positive for opponent).
A patent can be maintained with the claims as granted (a “good” result) or with amended claims (possibly good, possibly bad). As with prosecution we can capture this by weighting a score by the scope of the broadest maintained independent claim (e.g. claim_length_as_granted / claim_length_as_maintained).
Oral hearings (e.g. at the UK Intellectual Property Office or the European Patent Office) may be considered a “bonus” to a score or a separate metric, as any outcome would be taken into account by the above legal result.
For UK court cases, we again need to consider whether the attorney is working for or against the patentee. We could have the following outcomes:
- Patent is valid (all claims or some claims);
- Patent is invalid (all claims or some claims);
- Patent is infringed (all claims or some claims);
- Patent is not infringed (all claims or some claims);
- Case is settled out of court.
Having a case that is settled out of court provides little information, it typically reflects a position that both sides have some ground. It is likely better for the patentee than having the patent found invalid but not as good as having a patent found to be valid and infringed. Similarly, it may be better for a claimant than a patent being found valid but not infringed, but worse than the patent being found invalid and not infringed.
One option to score to partial validity or infringement (e.g. some claims valid/invalid, some claims infringed/not infringed) is to determine a score for each claim individually. For example, dependent claims may be treated using the shallowest dependency – effectively considering a new independent claim comprising the features of the independent claim and the dependents. A final score may be computed by summing the individual scores.
So this could work as a framework to score legal success based on legal outcomes. Theses legal outcomes may be parsed based on patent register data, claim data and/or court reports. There is thus scope for automation.
We still haven’t dealt with the issues of case transfers or different technical fields. One way to do this is to normalise or further weigh scores developed based on the above framework.
For technical fields, scores could be normalised based on average legal outcomes or scores for given classification groupings. There is a question of whether this data exists (I think it does for US art units, it may be buried in an EP report somewhere, I don’t think it exists for the UK). A proxy normalisation could be used where data is not available (e.g. based on internal average firm or company grant rates) or based on other public data, such as public hearing results.
Transferred cases could be taken into account by weighting by: time case held / time since case filing.
These may be measured by looking at the dates of event actions. These are often stored in patent firm record systems, or are available in patent register data.
It is worth noting that there are many factors outside the control of an individual attorney. For example, instructions may always be received near a deadline for a particular client, or a company may prefer to keep a patent pending by using all available extensions. The hope is that, as a first crude measure, these should average out over a range of applicants or cases.
For official responses, a score could be assigned based on the difference between the official due date and the date the action was completed. This could be summed over all cases and normalised. This can be calculated from at least EP patent register data (and could possibly be scraped from UKIPO website data).
For internal timeliness, benchmarks could be set, and a negative score assigned based on deviations from these. Example benchmarks could be:
- Acknowledgements / initial short responses sent with 1 working day of receipt;
- Office actions reported with 5 working days of receipt;
- Small tasks or non-substantive work (e.g. updating a document based on comments, replying to questions etc.) performed within 5 working days of receipt / instruction; and
- Substantive office-action and drafting work (e.g. reviews / draft responses) performed within 4 weeks of instruction.
This could be measured, across a set of cases, as a function of:
- a number of official communications issued to correct deviations;
- a number of requests to correct deficiencies (for cases where no official communication was issued); and/or
- a number of newly-raised objections (e.g. following the filing of amended claims or other documents).
This information could be obtained by parsing document management system names (to determine communication type / requests), from patent record systems, online registers and/or by parsing examination communications.
One issue with cost is that it is often relative: a complex technology may take more time to analyse or a case with 50 claims will cost more to process than a case with 5. Also different companies may have different charging structures. Also costs of individual acts need to be taken in context – an patent office response may seem expensive in isolation, but if it allows grant of a broad claim, may be better than a series of responses charged at a lower amount.
One proxy for cost is time, especially in a billable hours system. An attorney that obtains the same result in a shorter time would be deemed a better attorney. They would either cost less (if charged by the hour) or be able to do more (if working on a fixed fee basis).
In my post on pricing patent work, we discussed methods for estimating the time needed to perform a task. This involved considering a function of claim number and length, as well as citation number and length. One option for evaluating cost is to calculate the ratio: actual_time_spent / predicted_time_spent and then sum this over all cases.
Another approach is to look at the average number of office actions issued in prosecution – a higher number would indicate a higher lifetime cost. This number could be normalised per classification grouping (e.g. to counter the fact that certain technologies tend to get more objections).
The time taken would need to be normalised by the legal success measures discussed above. Spending no time on any cases would typically lead to very high refusal rates, and so even though a time metric would be low, this would not be indicative of a good attorney. Similarly, doing twice the amount of work may lead to a (small?) increase in legal success but may not be practically affordable. It may be that metrics for legal success are divided by a time spent factor.
Patent billing or record systems often keep track of attorney time. This would be the first place to look for data extraction.
An interesting result of this delve into detail is we see that legal success and cost need to be evaluated together, but that these can be measured independently of timeliness and error, which in turn . may be measured independently of each other. Indeed, timeliness and error avoidance may be seen as baseline competences, where deviations are to be minimised.
It would also seem possible, in theory at least, to determine these measures of success automatically, some from public data sources and others from existing internal data. Those that can be determined from public data sources raise the tantalising (and scary for some?) possibility of comparing patent firm performance, measures may be grouped by firm or attorney. It is hard to think how a legal ranking based on actual legal performance (as opposed to an ability to wine and dine legal publishers) would be bad for those paying for legal services.
It is also worth raising the old caveat that measurements are not the underlying thing (in a Kantian mode). There are many reasonable arguments about the dangers of metrics, e.g. from the UK health, railways or school systems. These include:
- the burden of measurement (e.g. added bureaucracy);
- modifying behaviour to enhance the metrics (e.g. at the cost of that which is not measured or difficult to measure);
- complex behaviour is difficult to measure, any measurement is a necessarily simplified snapshot of one aspect; and
- misuse by those in power (e.g. to discriminate or as an excuse or to provide backing for a particular point of view).
These, and more, need to be borne in mind when designing the measures. However, I believe the value of relatively objective measurement in an industry that is far too subjective is worth the risk.
This is a question that has been on my mind for a while. The answer I normally get is: “well, you just kind of know don’t you?” This isn’t very useful for anyone. The alternative is: “it depends”. Again, not very useful. Can we think of any way to at least try to answer the question? (Even if the answer is not perfect.)
The question begets another: “how do we measure success?”
- the broadest, strongest patent (or patent portfolio) obtained at the lowest cost;
- a patent or patent portfolio that covers their current and future products, and that reduces their UK tax bill; and/or
- a patent or patent portfolio that gets the company what it asks for in negotiations with third parties.
For an in-house attorney or patent department this may be:
- meeting annual metrics, including coming in on budget;
- a good reputation with the board of directors or the C-suite; and/or
- no surprises.
For an inventor this may be:
- minimum distruption to daily work;
- respect from peers in the technology field; and/or
- recognition (monetary or otherwise) for their hard work.
For a patent firm this may be:
- a large profit;
- high rankings in established legal publications; and/or
- a good reputation with other patent firms and prospective or current clients.
For a partner of a patent firm this may be:
- a large share of the profit divided by time spent in the office; and/or
- a low blood pressure reading.
As we can see, metrics of success may vary between stakeholders. However, there do appear to be semi-universal themes:
- Low cost (good for a company, possibly bad for patent attorneys);
- Minimal mistakes (good for everyone);
- Timely actions (good for everyone but sometimes hard for everyone); and
- High legal success rate (good for everyone).
High legal success rate (4) may include high numbers of:
- Case grants (with the caveat that the claims need to be of a good breadth);
- Cases upheld on opposition (if defending);
- Cases revoked on opposition (if opposing);
- Oral hearings won; and
- Court cases won.
I will investigate further how these can be measured in practice in a future post. I add the caveat that this is not an exhaustive list, however, rather than do nothing out of the fear of missing something, I feel it is better to do something, in full knowledge I have missed things but that these can be added on iteration.
Cost is interesting, because we see patent firms directly opposed to their clients. Their clients (i.e. companies) typically wish to minimise costs and patent firms wish to maximise profits, but patent firm profits are derived from client costs. For patent firms (as with normal companies), a client with a high profit margin is both an asset and a risk; the risk being that a patent firm of a similar caliber (e.g. with approximately equal metrics for 2-4 above) could pitch for work with a reduced (but still reasonable) profit margin. In real life there are barriers to switching firms, including the collective knowledge of the company, its products and portfolio, and social relationships and knowledge. However, everything has a price; if costs are too high and competing firms price this sunk knowledge into their charging, it is hard to reason against switching.
There is a flip side for patent firms. If they can maximise 2-4, they can rationalise higher charges; companies have a choice if they want to pay more for a firm that performs better.
On cost there is also a third option. If patent firms have comparative values for 2-4, and they wish to maintain a given profit margin, they can reduce costs through efficiencies. For most patent firms, costs are proportional to patent attorney time, reduce the time it takes to do a job and costs reduce. The question is then: how to reduce time spent on a matter while maintaining high quality, timeliness and success? This is where intelligence, automation and strategy can reap rewards.
In-house, the low cost aim still applies, wherein for a department cost may be measured in the number of patent attorneys that are needed or outside-counsel spend, as compared to a defined budget.
In private practice, and especially in the US, we often see an inverse of this measurement: a “good” patent attorney (from a patent firm perspective) is someone who maximises hourly billings, minimises write-downs, while anecdotally maintaining an adequate level for 2-4. One problem is maximising hourly billings often leads to compromise on at least 2 and 3; large volumes of work, long hours, and high stress are often not conducive to quality work. This is why I have an issue with hourly billing. A base line is that a profit per se is required, otherwise the business would not be successful. Further, a base line of profit can be set, e.g. allowing for a partner salary of X-times the most junior rate, an investment level of Y%, a bonus pool for extra work performed etc.. However, beyond that, the level of profit is a factor to maximise, subject to constraints, i.e. 1-4 above, where the constraints take priority. The best solution is to align profit with the constraints, such that maximising 1-4 maximises profit. That way everyone benefits. How we can do this will be the subject of a future post.
So, let’s return to our original question: what makes a good patent attorney?
From the above, we see it is a patent attorney that at least makes minimal mistakes, operates in a timely manner, has a high legal success rate and provides this at a low cost. In private practice, it is also a patent attorney that aligns profit with these measures.
One source of frustration with a time-based charging structure (“billable hours”) is that it is difficult to accurately estimate how long a piece of work will take. This post looks at ways we can address this. (Or at least puts down my thoughts on virtual paper .)
Many professional services are priced based on an hourly (or day) rate. This is true of private practice patent attorneys. Although there are critics, the persistence of the billable hour suggests it may be one of the least worst systems available.
Most day-to-day patent work in private practice consists of relatively small items of work. Here “small” means around £1k to £10k, as compared to the £1m cases or transactions of large law firms. These small items of work typically stretch over a few weeks or months.
When performing patent work an unforeseen issue or a overly long publication can easily derail a cost estimate. For example, it is relatively easy to find that a few more hours are needed after looking into an examiner objection or piece of prior art in more detail. This often presents a lose-lose situation for both attorney and client – the work needs to be done, so either the attorney has to cap their charges in line with an estimate or the client needs to pay above an estimate to complete the job. This is not just an issue for patent attorneys – try comparing any quote from a builder or plumber with the actual cost of the work.
This got me thinking about taxis. They have been around for a while, and recent services like Uber offer you a price on your phone that you then accept. This is a nice system for both customer and driver – the customer gets a set price and likewise the driver gets a fare proportional to her time. Could something like that work for patent work?
For taxi services, the underlying variable is miles (or kilometres depending on your Brexit stance). A cost is calculated by adding a mile-based rate to a basic charge, with minimum and cancellation charges.
For patent work, one underlying variable is words. Take an examination report (or “office action”). The amount of time it takes to respond to novelty and inventive step objections is typically proportional to the length of the patent specification in question, the number of claims and the number of prior art citations.
Now, we can use EPO OPS to retrieve the full text of a patent application, including description and claims. We can also retrieve details of citations and their relevance to the patent application (e.g. category ‘X’, ‘Y’ or ‘A’). I am working on parsing PDF documents such as examination reports to extract the text therein. In any case, this information can be quickly entered from a 5 minute parse of an examination report.
Wikipedia also tells me that an average reading rate for learning or comprehension is around 150-200 words per minute.
This suggests that we can automate a time estimate based on:
- Words in the description of a published patent application – WPA (based on a need to read the patent application);
- Number of claims – NCPA (applications with 100s of claims take a lot longer to work on);
- Words in the claims – WCPA (claims with more words are likely to take more time to understand);
- For each relevant citations (category ‘X’ or ‘Y’ – this could be a long sparse vector) :
- Words in the description of the citation – WCITx (as you need to read these to deal with novelty or inventive step objections); and
- A base time estimate multiplied by the number of objections raised – BTo (possibly weighted by type):
- E.g. x amount of time per clarity objection, y amount of time per novelty objection.
Even better we need not work out the relationship ourselves. We can create a numeric feature vector with the above information and let a machine learning system figure it out. This would work based on a database of stored invoicing data (where an actual time spent or time billed amount may be extracted to associate with the feature vector).
The result would be an automated system for pricing an examination report response based on publically available data. We could host this on Heroku. By doing this we have just created a marketplace for patent responses – a single weight could be used by patent firms to set their pricing.
Similar pricing models could also be applied to patent drafting. The cost of a draft may be estimated based on a set length, number of drawings, number of claims, number of independent claims, length of invention disclosure and length of known prior art. The variables for a response are similar for a European opposition or an appeal, just with different weights and expanded feature vectors to cover multiple parties.
This would yield a compromise between billable hours and fixed fees. For example, a variable yet fair fixed fee estimate may be generated automatically before the work is performed. The client gets predictability and consistency in pricing. The attorney gets paid in proportion to her efforts.
One thing I have been trying to do recently is to connect together a variety of information sources. This has inevitably involved Python.
Due to the Windows-centric nature of business software, I have also needed to setup Python on a Windows machine. Although setting up Python is easy on a Linux machine it is a little more involved for Windows (understatement). Here is how I did it.
- First, download and install one of the Python Windows installers from here. As I am using several older modules I like to work with version 2.7 (the latest release is 2.7.8).
- Second, if connecting to a Microsoft SQL database, install the Python ODBC module. I downloaded the 32-bit version for Python 2.7 from here.
- Third, I want to install IPython as I find a notebook is the best way to experiment. This is a little long-winded. Download the ez_install.py script as described and found here. I downloaded into my Python directory. Next run the script from the directory (e.g. python ez_setup.py). Then add the Python scripts directory to your Environmental Variables as per here. Then install IPython using the command: easy_install ipython[all].
- Fourth, download a Windows installer for Numpy and Pandas from here. I downloaded the 32-bit versions for Python 2.7. Run the installers.
Doing this I can now run a iPython notebook (via the command: ipython notebook – this will open a browser window for your default browser). I found Pandas gave me an error on the initial import as dateutil was missing – this was fixed by running the command: easy_install python-dateutil.