Flexible Working & Career Breaks: A Personal Perspective

In advance of IP Inclusive’s Women in IP Flexible Working & Career Breaks Event, this post sets out some of the things I learnt taking a career break (shared parent leave) and working part-time. I hope it may be useful for others considering “off piste” career options.

For context, I have stuck my route through this whole thing at the bottom. Although I come at it from the point of view of a patent attorney, many points apply to other roles within the intellectual property (IP) profession, including paralegals, searchers, examiners, “back-office” roles, and IP solicitors.

Another disclaimer: these are my own views and not necessarily those of my employers present or past. You may also disagree; I am open to (polite!) discussion in the comments below.


As everyone likes “listicles” (or so say the £300/hour marketing consultants), I have decomposed my experience into ten key points:

  1. There is no “official” path
  2. You are lucky
  3. Switch your view
  4. You have a choice, but you can’t have it all
  5. Time = Money = Career
  6. The Patriarchy is accidental
  7. Two is better than one
  8. There is life outside of London
  9. Remote working works
  10. Imperfect Balance

Intrigued? Let’s have a look.


There is no “official” path

gray pathway surrounded by green tress
Photo by Skitterphoto on Pexels.com

When I first entered the world of intellectual property there seemed to be a well-defined structure: start in a junior role, put in the hours, work your way up to a senior role and a larger salary, work for another 20 years, get a gold clock and hope you have a defined benefit pension scheme.

This world is changing. Maybe it didn’t really exist at all. Assumptions are being called into question. The opening up of the workplace to a wider variety of backgrounds, together with general societal trends, means that people are taking a different, more realistic, and in many ways healthier, approach to work and careers.

Work for long enough and you too may see the cracks in the traditional narrative. Staff leave and join. Things you assumed were part of a “life path”, like housing or good health may no longer be available. Over the water-cooler or a networking pint, you hear whispers of others “doing something different”. You see people get lucky and unlucky rolls of the dice. You begin to learn that there are other possibilities, and that the hardest part is often imagining them.

The Internet and social media is a great help with this. You can see people move around on LinkedIn, post about their experiences on blogs such as this, or comment on Twitter.

Start with honesty. If things don’t seem to be working, try to work out why and don’t assume that there is a set structure or path you need to follow. Reach out to others in your employment, or via social media, to see what other routes there may be. Don’t be afraid to chat things through informally with your employer, line manager or a member of human resources; in my experience any fear I had was misplaced and people genuinely want to find something that works for everyone.

You are lucky

green club flower
Photo by Anthony on Pexels.com

If you are working in the world of IP, you are lucky. This applies regardless of your role.

Don’t believe me? If you work in private practice, you can look up the accounts of patent and law firms for free at Companies House. You will see that these businesses do not operate with food-production or milk-farmer level margins. Technology companies also tend to be better placed in the economy than other sectors, such as retail. Yes there are worse years and better years, but look at annual profits or average partner remuneration (spoiler: private practice averages are somewhere between £200k and £500k). The patent profession is “effectively a legal cartel” (quoting Sir Colin Birss).

This means that there is more monetary slack in the system than there may be in other industries. The skills of all roles are in high demand. This put employees in a good negotiating position. Firms and companies need to retain trained staff, and most firms and companies are always looking for more staff. Use this to your advantage. You have the margin to be brave in your choices.

Switch your view

ball ball shaped blur daylight
Photo by Pixabay on Pexels.com

That being said, businesses need to be run, and they need to make a profit. I’ve been on both sides of the managerial chair. If you want to branch out beyond the normal 9-5 employment contracts, your options also need to make managerial and financial sense for those you are working for. In an ideal world, it would be nice to live an idle life on  law firm partner pay. But we can’t all be Jacob Rees-Mogg.

A good place to start is to work out your cost to your employer or to those using your services. Think about how your employer affords your salary or pay. In private practice, income comes from invoicing clients. In industry, a department may have an annual budget, revenue targets and key performance indicators. Remember that you often cost more than the money you receive in your paycheck. There is tax, national insurance, employee benefits, office costs, training costs, pension costs, etc. etc.

Then think about how your activities fit into the value chain. Docketing or paralegal cost may be charged as a “service fee” or have a budgeted amount per case. How much of your effort is required per work type? How much do you need to be paid to cover your personal expenses? How many hours do you have available per day? What seems like a endless series of choices can be whittled down to a practical proposal.

For employers, “flexible working” and “career breaks” have often been viewed with distrust. It was thought that employees could not be trusted, that flexible workers were too “difficult” to manage, or that the bottom line would fall. None of this has really come to pass. Indeed, the more astute employers are realising that in a field where there is a large demand for skills, offering flexibility can be a great selling point. Work/life balance was recently voted the most important concern for attorneys, above salary. Turns out wealth isn’t everything, who would have guessed? Beyond recruitment, a flexible workforce can also benefit the bottom line, helping get the day jobs done and freeing up in-house attorney time. The legal market is stretching vertically and approaches traditionally applied by counsel in industry to manage outside counsel are now being applied by law firms to manage virtual teams.

You have a choice, but you can’t have it all

woman eating cupcake while standing near pink background inside room
Photo by Moose Photos on Pexels.com

In my experience, running one or more households and caring for one or more human beings is a full-time job. Trying to wish otherwise is unrealistic. The choice comes in the form of the person (or persons) doing this job. It could be:

  • one person full-time without pay (e.g. the traditional “houseperson” or grandparent);
  • two people part-time without pay (e.g. two freelancers or a “second job” + grandparent or sibling);
  • one person full-time with pay (e.g. a childminder or full-time carer); or
  • one person doing this part-time without pay and one person doing this part-time with pay (e.g. the also semi-traditional “second job” + “help”).

The person could be a parent, a child, a family member, or a friend.

For some the choice may not be much of a choice. Each person has a different set of circumstances. Some options may be easier than others. Many options may only be options if something is given-up or bartered. Having your cake and eating it is like a free-lunch, you are paying somewhere, it just may not be obvious at first sight. These may not be as intuitive as you think they are.

What you can’t try to wish out of existence is the caring role itself. I like to view carer roles like physicists view dark matter – just because you don’t see them doesn’t mean they are not there. It is a cliché (but true) that for every person with external responsibilities working 12-hour days there are others that we do not see working hard, and we may never directly see this effort. It seems tragic that it is only recently we are being to talk about how these roles are divided, and realising that a majority of the population may have an opinion on the matter that differs from our expectations.

Time = Money = Career

gray wooden coin box with green leaf plant on gray sand
Photo by Pixabay on Pexels.com

Once you have set out the demands of your external responsibilities, your job responsibilities and your expenses, you can start to map the options available to all parties in a household.

Three levers you can vary are: time, money and career. These are interconnected. You can have more time, but this may come at a literal cost to your wallet and a figurative cost to your career (or at least a certain image of your career). You can have higher pay, but you may have less time.

You may have more control over these than you think. Most people think of money and career as a one-way street (for the patent attorneys: “a monotonically increasing function“). Looking at the hours many senior staff put in, as you rise in your role, your “free” time also diminishes. Many of us know senior partners in law firms or department managers that are checking their emails while on holiday or in the evenings. This email time is time that is not available for a role caring for other human beings, or doing household chores, or taking up a hobby, or volunteering in a local group. Time is a fixed quantity. Remember the cost is not always visible.

In negotiations for flexible working do not be afraid to trade salary for time. If you have fewer hours available per week, e.g. because you have to care for a child or relative, this will mean you can work fewer hours per week. However, make sure that you are not being paid less for the same work. Look at the value your work provides your employer – for a paralegal this may be charged out directly, for an IT support role this may be indirectly charged out as patent attorney or paralegal time (if the computers stopped working no one could charge anything). This may be harder for roles in industry than for private practice, but you may find a costing for IP in an annual review or a licensing income you can use to justify your value. You may feel a bit besmirched putting everything in monetary terms but remember this is not your value, it is just a representation of your value for dealing with an entity whose purpose is monetary (to make a profit). You can increase your chances of success by playing the game. You can also play around with the figures, and mock up different scenarios.

The Patriarchy is accidental

grayscale photo of four men sitting on bench along the street
Photo by Jan Kroon on Pexels.com

In private practice, most law firms have a heavily male partnership. Roles in industry are better, and it is good to see Women in IP highlighting senior female figures in IP, but there is still a male skew (evidence: attendees at CIPA Congress 2018). There is also a female skew to paralegal roles. Some believe this is driven by unintelligent design; I’m more of the opinion this is a historical accident that is open to change.

Now I’m not going to take on what is a huge issue, and walk briskly into a minefield. But I can just about see how a heavy-male skew in senior patent attorney roles can arise. First there is a bias towards men working in science and engineering (one I’m trying to reverse with my parental nudging to my daughters). Second to enter partnership in private practice you need to work hard and have excellent billing figures. The time many enter partnership also overlaps with the time that many start a family (30-40s), or if you have older parents, manage their ageing. Even at the most progressive patent firms, partnership and a 3-day week is generally not an option. To be fair, it is difficult (impossible?) to provide full 24-hour service to demanding clients and provide full 24-hour service to demanding children. Cultural bias, breastfeeding, self-affirming networks, the need to physically recover after birth, and the need to pay the bills all conspire to nudge the pinball of life towards traditional male-female caregiver roles. Now just because something is, doesn’t mean it should be. A first step is awareness; a second step is change. I receive many a confused look from men when I ask them whether they are willing to give up their own career development for a time to allow their partner to rise up through the ranks. Normally this is excused on monetary terms: we couldn’t afford it. Forgive my scepticism, but in the patent world this can come from people earning six-figure salaries. The average UK full-time salary is below £30,000 (gross).

Once you see the random trail of the pinball you can take more control over its direction. As a business owner or employer you can make the playground equal to allow both men and women to take time off. You can change how work is performed to make it easier to distribute work in a piece-meal fashion, facilitating flexible working. Men can realise that they may need to adjust their perceived route through life, that there are other paths. We can all switch off our phones at 5:30pm.

Two is better than one

photography of two clowns in the street
Photo by Mikey Dabro on Pexels.com

In you live in a two-parent household, we found that having two people work part-time or flexible roles is better than the traditional breadwinner / houseperson split. In purely financial terms, in the UK, you can be 15-20% better off with this arrangement, based on tax rates and penalties regarding child benefit and childcare.

Having two people work part-time can also allow a career to be pursued to a lesser extent by both parties, rather than requiring the traditional sacrifice from one half of the household. This has benefits outside of pure employment. For example, the experiences of both parties are closer, which reduces resentment and facilitates understanding.

For the partner used to working long office hours, there are benefits to stepping off the pedal and taking on more of the unpaid household tasks. Dropping the kids off at school or childcare allows you to become involved in your local community, and move out of a workplace or professional bubble. It is very easy to get caught up in a world of work, but such a world is fairly brittle. If the unexpected or unwelcome hit, they often hit hard. Working flexibly or part-time allows you to build up social capital that can make you more resilient.

Bronnie Ware, an Australian palliative nurse, noted that one of the top regrets of dying male patients was “I wish I hadn’t worked as hard”.

There is life outside of London

herd of white and black cows on grass field
Photo by Lukas Hartmann on Pexels.com

*A collective gasp is heard along Chancery Lane*

Live in or around London and London often seems like life. It is not. There are IP roles outside of London. Not all large businesses have their headquarters in London. There are private practice offices in most UK university towns.

Living outside of London could half your housing and living costs, with a smaller proportional drop in salary. You could move somewhere where you can walk or cycle to work. Work outside of London and you can avoid commutes that clock up as an hour each way with what seems like the entire population of Denmark. Yes, there is less culture, fewer venues, galleries and restaurants, but have three kids and you rather quickly end up going to bed at 9pm and a walk to the shops resembles a military exercise. You are also not confronted with raging inequality that takes mental effort to filter out and rationalise everyday.

Remote working works

grayscale photo of computer laptop near white notebook and ceramic mug on table
Photo by Negative Space on Pexels.com

When I started in the profession we were only just beginning to receive instructions by email. This was seen as “something not quite right”. Post (or perhaps facsimile transmission) was “how things were done”. People shuffled between patent firms and the London post branch of the UK Patent Office with brown-wrapped parcels. Work literally piled up in front of you.

In over a decade much of this has changed. Working can often feel like the film Inception as I dive into virtual machines within virtual networks within virtual machines. You can file a patent application from anywhere in the world. You can receive all communications electronically. Online file and docketing systems manage files and due dates in the cloud. Everyone loves Microsoft Word (okay that one isn’t great).

One of the panel at the Women in IP event, and an ex-colleague, spent a year working as a patent attorney while circumventing the globe. I know of patent attorneys who work for UK firms that do not live in the UK.

As a slight caveat, I am relatively IT-savvy.  But I was surprised at the ease with which I could work outside of the office on a laptop or computer. All you need is good WiFi. Do not be afraid.

Imperfect Balance

orange cube
Photo by Magda Ehlers on Pexels.com

Life often seems to me like a Necker Cube. Look at it one way and you are a total failure,  miserably falling short of your true potential. Flip your frame of reference and you are a wild success, much better off than most of humanity, alive or dead. Both may be true. No one can know for sure.

You have to be realistic. I spend much more time with my kids that many dads. But this doesn’t mean we live in Von Trappian bliss. If anything I probably get cross at my kids more than the average bedtime or weekend dad. Much of my day can pass in domestic drudgery (washing, dishes, shopping, cooking, ferrying; then repeat). But I feel richer than a Partner in a London law firm.

Human beings have known for over two and a half millennia that truth and success lie in a middle way. But this is hard because our desires, inclinations and habits pull us to the poles. You need to exert effort to stay balanced. Like riding a bike. Knowing how to do this takes years and is a skill you acquire. It doesn’t arrive by magic.

Working flexibly, taking a career break or changing your career may need you to give up certain things. But you can gain other things that can provide a path of fulfilment that may not be visible in life’s melee.


My Route

Photo 20-09-2018, 17 03 15
Note: Mess and the CIPA Journal

I started off my career on the well-trodden patent attorney path. In my last year of university I sent off my letters to most of the patent firms in the Inside Careers guide, and ended up being lucky enough to be accepted for a training place at Gill Jennings & Every (GJE). Things went pretty smoothly for the first few years: I had success on the QMW Diploma course and the European Qualifying Examinations, and was progressing nicely within the firm. I got married and we were looking at buying somewhere to live in Wood Green. Then life, as it does, threw a few curve balls.

First, the “Great Recession” hit. Suddenly work started to dry up, and recruitment froze. It didn’t feel like a huge event at the time, but in hindsight it was clear that a general feeling of uncertainty and chaos hung around. Second, a close friend tragically died. She was a high-flying trainee lawyer. Although officially unrelated, I couldn’t help feel that the all-night working sessions and London lawyer lifestyle were somehow implicated. Third, my first daughter was born. There were complications with the birth, which shall we say “didn’t go well”, and she ended up in Intensive Care for a couple of weeks with possible brain damage. Luckily she appears fine now, 8 or 9 years later, but at the time we didn’t know whether there were going to be learning difficulties or other problems growing up.

This led us to look to leave London and to move closer to my (relatively large) family that lived in central Somerset. Bristol and Bath both had patent attorney firms and I sent the feelers out amongst the recruitment consultants. I ended up at EIP in Bath. We lived between family in Somerset and a family friend’s flat in Bath until we were lucky enough to find a small terraced house in Bath (any spelling mistakes are due to the lead paint and asbestos tiles I removed – we’re still there now).

Relatively quickly I was back on the career track, although now one slightly askew from the London mainline. My second daughter came along and I was eyeing up partnership in the distance along the classic lines. We tried to juggle parenting as best we could but ended up falling back onto rather typical binary roles (me at work until 6-7pm many nights, the other half being the default for the kids). Just after the birth of my second daughter, my father-in-law died suddenly, which shook us up a bit. My partner had a relatively small family, which was now even smaller, whereas I was used to large families down in Somerset. We thus decided to have one more.

At this time we began to look again at our parenting roles. Neither I nor my partner were particularly enjoying the long stressful hours (which were actually fairly tame by London standards), and the standard career route for me was for this all to become worse (for at least 10 years working up through the lower rungs of partnership). My partner enjoys her work and wanted to do more of it, but this meant days away once or twice a month that I often had to take as annual leave. It just so happened that it was at this time that the shared parental leave scheme came into force. It looked perfect.

In the end I applied to take 7 months shared parental leave while my partner took 2 months. I also put in a flexible working request for my return, to go down to 3 days a week. Both were granted (there wasn’t much choice on the first). Work was really good about things but as the scheme was in its infancy I ended up helping with human resources on the form of the policy. The downside was that the pay was just statutory. This meant going down to around £550/month for 7 months. All non-essential bills were cut, all food shops were Lidl (and still are), and any bills that could be frozen were. (In hindsight I should have asked for some pay during this period, at least matching the maternity schemes and possibly going beyond, trading this for an agreement to work for a particular length of time on my return.)

In any case the shared parental leave worked fairly well. There was a lot less pressure on my partner than the previous two. I could do the night shifts and the sleep training (think waking up every hour throughout the night for several months) without collapsing at work the next day. There was no need to drag a baby to the school and nursery drop offs. The drop in income was mitigated slightly by the fact you can’t really go out or do anything with three kids.

Flexible working also worked, to a certain extent. It felt a little split-brain – three days a week life was back to pre-parental leave days, being in the office and dealing with the day-to-day; then two days a week, I was plunged into the world of chores, kid juggling, and dirty nappies. The cognitive dissonance is sometimes hard. I wrote a bit about it here. Flexible working allowed my partner to work longer or non-standard hours on the other two days.

As the sleeping began to improve I looked at moving up to working four days a week to get back onto a partnership track. However, after a few months we found this wasn’t really working. The kids each needed to be in different locations at varying times between 8:30am and 9:30am, and to be picked up at times that span from 12:30pm to 4:30pm. Working four days a week meant that the workload crept back up to a 8:30am to 6:30pm working day – with commuting time this meant I was out for much of the time the kids were awake. Parenting is a zero-sum game and so someone else needed to be there when I was not there. A nanny or childminder was not an option when we looked at our after tax salaries. My parents have nearly a decade before they retire and my partner’s mother lives three hours away. I started chatting to and reading about others in similar situations – both in real-life and online. Becoming a consultant for a consultancy firm seemed an option. I had two very useful conversations with ex-colleagues, they were a great help explaining how things could work. Also what was the partnership everyone was aiming at, if not being your own boss? I decided to bite the bullet and start my own consultancy business in December 2017. I now work preparing patent drafts and office actions for various clients on a job-by-job basis.

My current arrangement is the best I have. I have traded off income for time. I can now do school and nursery pick-ups and drop offs. My partner can spend days away without a PhD in logistics. The income-time trade-off isn’t as linear as it seems: e.g. being around more means I can spend an hour at 4pm cooking tea from simple ingredients so a meal for 5 costs £2.50 rather than £25, in effect saving £22.50 after-tax. This is found in many other areas. Another is that I now have no commute, so I don’t need to pay £7000 after tax per year just to work. I can also choose to work hard on billable jobs one week, then have less paid work the next week to “work” on enjoyable not directly billable tasks such as this blog post, or processing 292,000 G06 US patent specifications.

As a caveat. It is still early days. It could all go disastrously wrong. I hope it doesn’t.

Building a Claim-Figure-Description Dataset

When working with neural network architectures we need good datasets for training. The problem is good datasets are rare. In this post I sketch out some ideas for building a dataset of smaller, linked portions of a patent specification. This dataset can be useful for training natural language processing models.

What are we doing?

We want to build some neural network models that draft patent specification text automatically.

In the field of natural language processing, neural network architectures have shown limited success in creating captions for images (kicked off by this paper) and text generation for dialogue (see here). The question is: can we get similar architectures to work on real-world data sources, such as the huge database of patent publications?

How do you draft a patent specification?

As a patent attorney, I often draft patent specifications as follows:

  1. Review invention disclosure.
  2. Draft independent patent claims.
  3. Draft dependent patent claims.
  4. Draft patent figures.
  5. Draft patent technical field and background.
  6. Draft patent detailed description.
  7. Draft abstract.

The invention disclosure may be supplied as a short text document, an academic paper, or a proposed standards specification. The main job of a patent attorney is to convert this into a set of patent claims that have broad coverage and are difficult to work around. The coverage may be limited by pre-existing published documents. These may be previous patent applications (e.g. filed by a company or its competitors), cited academic papers or published technical specifications.

Where is the data?

As many have commented, when working with neural networks we often need to frame our problem as map X to Y, where the neural network learns the mapping when presented with many examples. In the patent world, what can we use as our Xs and Ys?

  • If you work in a large company you may have access to internal reports and invention disclosures. However, these are rarely made public.
  • To obtain a patent, you need to publish the patent specification. This means we have multiple databases of millions of documents. This is a good source of training data.
  • Standards submissions and academic papers are also published. The problem is there is no structured dataset that explicitly links documents to patent specifications. The best we can do is a fuzzy match using inventor details and subject matter. However, this would likely be noisy and require cleaning by hand.
  • US provisional applications are occasionally made up of a “rough and ready” pre-filing document. These may be available as priority documents on later-filed patent applications. The problem here is that a human being would need to inspect each candidate case individually.

Claim > Figure > Description

At present, the research models and datasets have small amounts of text data. The COCO image database has one-sentence annotations for a range of pictures. Dialogue systems often use tweet or text-message length text segments (i.e. 140-280 characters). A patent specification in comparison is monstrous (around 20-100 pages). Similarly there may be 3 to 30 patent figures. Claims are better – these tend to be around 150 words (but can be pages).

To experiment with a self-drafting system, it would be nice to have a dataset with examples as follows:

  • Independent claim: one independent claim of one predefined category (e.g. system or method) with a word limit.
  • Figure: one figure that shows mainly the features of the independent claim.
  • Description: a handful of paragraphs (e.g. 1-5) that describe the Figure.

We could then play around with architectures to perform the following mappings:

  • Independent claim > Figure (i.e. task 4 above).
  • Independent claim + Figure > Description (i.e. task 7 above).

One problem is this dataset does not naturally exist.

Another problem is that ideally we would like at least 10,000 examples. If you spent an hour collating each example, and did this for three hours a day, it would take you nearly a decade. (You may or may not also be world class in example collation.)

The long way

Because of the problems above it looks like we will need to automate the building of this dataset ourselves. How can we do this?

If I was to do this manually, I would:

  • Get a list of patent applications in a field I know (e.g. G06).
  • Choose a category – maybe start with apparatus/system.
  • Get the PDF of the patent application.
  • Look at the claims – extract an independent claim of the chosen category. Paste this into a spreadsheet.
  • Look at the Figures. Find the Figure that illustrated most of the claim features. Save this in a directory with a sensible name (e.g. linked to the claim).
  • Look at the detailed description. Copy and paste the passages that mention the Figure (e.g. all those paragraphs that describe the features in Figure X). This is often a continuous range.

The shorter way

There may be a way we can cheat a little. However, this might only work for granted European patents.

One bug-bear enjoyable part of being a European patent attorney is adding reference numerals to the claims to comply with Rule 43(7) EPC. Now where else can you find reference numerals? Why, in the Figures and in the claims. Huzzah! A correlation.

So a rough plan for an algorithm would be as follows:

  1. Get a list of granted EP patents (this could comprise a search output).
  2. Define a claim category (e.g. based a string pattern – [“apparatus”, “system”]).
  3. For each patent in the list:
    1. Fetch the claims using the EPO OPS “Fulltext Retrieval” API.
    2. Process the claims to locate the lowest number independent claim of the defined claim category (my PatentData Python library has some tools to do this).
    3. If a match is found:
      1. Save the claim.
      2. Extract reference numerals from the claim (this could be achieved by looking for text in parenthesis or using a “NUM” part of speech from spaCy).
      3. Fetch the description text using the EPO OPS “Fulltext Retrieval” API.
      4. Extract paragraphs from the description that contain the extracted reference numerals (likely with some threshold – e.g. consecutive paragraphs with greater than 2 or 3 inclusions).
      5. Save the paragraphs and the claim, together with an identifier (e.g. the published patent number).
      6. Determine a candidate Figure number from the extracted paragraphs (e.g. by looking for “FIG* [/d]”).
      7. Fetch that Figure using the EPO OPS “Drawings” or images retrieval API.
        • Now we can’t retrieve specific Figures, only specific sheets of drawings, and only in ~50% of cases will these match.
        • We can either:
          • Retrieve all the Figures and then OCR these looking for a match with the Figure number and/or the reference numbers.
          • Start with a sheet equal to the Figure number, OCR, then if there is no match, iterate up and down the Figures until a match is found.
          • See if we can retrieve a mosaic featuring all the Figures, OCR that and look for the sheet number preceding a Figure or reference numeral match.
      8. Save the Figure as something loadable (TIFF format is standard) with a name equal to the previous identifier.

The output from running this would be triple similar to this: (claim_text, paragraph_list, figure_file_path).

We might want some way to clean any results – or at least view them easily so that a “gold standard” dataset can be built. This would lend itself to a Mechanical Turk exercise.

We could break down the text data further – the claim text into clauses or “features” (e.g. based on semi-colon placement) and the paragraphs into clauses or sentences.

The image data is black and white, so we could resize and resave each TIFF file as a binary matrix of a common size. We could also use any OCR data from the file.

What do we need to do?

We need to code up a script to run the algorithm above. If we are downloading large chunks of text and images we need to be careful of exceeding the EPO’s terms of use limits. We may need to code up some throttling and download monitoring. We might also want to carefully cache our requests, so that we don’t download the same data twice.

Initially we could start with a smaller dataset of say 10 or 100 examples. Get that working. Then scale out to many more.

If the EPO OPS is too slow or our downloads are too large, we could use (i.e. buy access to) a bulk data collection. We might want to design our algorithm so that the processing may be performed independently of how the data is obtained.

Another Option

Another option is that front page images of patent publications are often available. The Figure published with the abstract is often that which the patent examiner or patent drafter thinks best illustrates the invention. We could try to match this with an independent claim. The figure image supplied though is smaller. This maybe a backup option if our main plan fails.

Wrapping Up

So. We now have a plan for building a dataset of claim text, description text and patent drawings. If the text data is broken down into clauses or sentences, this would not be a million miles away from the COCO dataset, but for patents. This would be a great resource for experimenting with self-drafting systems.

 

 

Can robots have needs?

A patent attorney digresses.

I recently read an article by Professor Margaret Boden on “Robot Needs”. While I agree with much of what Professor Boden says, I feel we can be more precise with our questions and understanding. The answer is more “maybe” than “no“.

Warning: this is only vaguely IP related.

Definitions & Embodiment

First, some definitions (patent attorneys love debating words). The terms “robot”, “AI” and “computer” are used interchangeably in the article. This is one of the problems of the piece, especially when discussing “needs”. If a “computer” is simply a processor, some memory and a few other bits, then yes, a “computer” does not have “needs” as commonly understood. However, it is more of an open question as to whether a computing system, containing hardware and software, could have those same “needs”.

AI

This brings us to “AI”. The meaning of this term has changed in the last few years, best seen perhaps in recent references to an “AI” rather than “AI” per se.

  • In the latter half of the twentieth century, “AI” was mainly used in a theoretical sense to refer to non-organic intelligence. The ambiguity arises with the latter half of the term. “Intelligence” means many different things to many different people. Is playing chess or Go “intelligent”? Is picking up a cup “intelligent”? I think the closest we come to agreement is that it generally relates to higher cortical functions, especially those demonstrated by human beings.
  • Since the “deep learning” revival broke into public consciousness (2015+?) “AI” has taken on a second meaning: an implementation of a multi-layer neural network architecture. You can download an “AI” from Github. “AI” here could be used interchangeably with “chatbot” or a control system for a driverless car. On the other hand, I don’t see many people referring to SQL or DBpedia as an “AI“.
  • AI” tends to be used to refer more to the software aspects of “intelligent” applications rather than a combined system of server and software. There is a whiff of Descartes: “AI” is the soul to the server “body

Based on that understanding, do I believe an “AI” as exemplified by today’s latest neural network architecture on Github has “needs“? No. This is where I agree with Professor Boden. However, do I believe that a non-organic intelligence could ever have “needs“? I think the answer is: Yes.

Robots

This leads us to robots. A robot is more likely to be seen as having “needs” than “AI” or a “computer“. Why is this?

mars-mars-rover-space-travel-robot-73910

Robots have a presence in the physical world – they are “embodied“. They have power supplies, motors, cameras, little robotic arms, etc. (Although many forget that your normal rack servers share a fair few components.) They clearly act within the world. They make demands on this world, they need to meet certain requirements in order to operate. A simple one is power; no battery, no active robot. I think most people could understand that, in a very simple way, the robot “needs” power.

Let’s take the case where a robot is powered by a software control system. Now we have a “full house“: a “robot” includes a “computer” that executes an “AI“. But where does the “need” reside? Again, it feels wrong to locate it in the “computer” – my laptop doesn’t really “need” anything. Saying an “AI” “needs” something is like saying a soul “needs” food (regardless of whether you believe in souls). We then fall back on the “robot“. Why does the robot feel right? Because it is the most inclusive abstract entity that encompasses an independent agent that acts in the world.

Needs, Goals & Motivation

Before we take things further lets go on a detour to look at “needs” in more detail. In the article, “needs” are described together with “goals” and “motivation“. Maslow’s famous pyramid features. In this way, a lot is packaged into the term.

Maslow’s Pyramid – By Factoryjoe on WikiCommons

Can we have “needs” without “goals“? Possibly. A quick google shows several articles on “What Bacteria Need to Live” (clue: raw chicken and your kitchen). I think we can relatively safely say that bacteria “need” food and water and a benign environment. Do bacteria have “goals“? Most would say: No. “Goals“, especially as used to describe human behaviour, suggests the complex planning and goal-seeking machinery of the human brain (e.g. as a crude generalisation: the frontal lobes and corpus striatum amongst others). So we need to be careful mixing these – we have a term that may be applied to the lowest level of life, and a term than possibly only applies to the highest levels of life. While robots could relatively easily have “needs“, it would be more much difficult to construct one with “goals“. We would also stumble into “motivation” – have does a robot transform a “need” into a “goal” to pursue it?

Now, as human beings we instinctively know what “motivation” feels like. It is that feeling in the bladder that drives you off your chair to the toilet; it is the itchy uneasiness and dull empty abdominal ache that propels you to the crisp packet before lunch; it is the parched feeling in the throat and the awareness that your eyes are scanning for a chiller cabinet. It is harder to put it into words, or even to know where it starts or ends. Often we just do. Asked why we are doing what we do and the brain makes up a story. Sometimes there is a vague correlation between the two.

Now this is interesting. Let’s have a look at brains for more insight.

Brains

Nature is great. She has evolved at least the Earth’s most efficient data processing device (ignore that the “she” here also doesn’t really exist). Looking at how she has done this allows us to cheat a little when building robots.

A first thing to note is that nature is lazy and stupid (hurray!). She recycles, duplicates, always takes the easy option. This paradoxically means we have arrived at efficiency through inefficiency. Brains started out as chemical gradients, then rudimentary cellular architecture to control these gradients, then multi-cellular architectures, nervous passageways, spinal cords, brain stems, medullas, pons, mid-brains, limbic structures and cortex. Structures are built on top of structures and wired up in ways that would give an electrician a heart attack. Plus structures are living – they change and grow over time within an environment.

In the brain “needs“, at least those near the bottom of the Maslowian pyramid, map fairly nicely onto lower brain structures: the brain stem, medulla, pons, and mid-brain. The thalamus helps to bridge the gap between body and cortex. The cortex then stores representations of these “needs“, and maps them to and from sensory representations. Another crude and incorrect generalisation, but those lower structures are often called the “lizard brain“, as those bits of neural hardware are shared with our reptilian cousins. The raw feeling of “needs” such as hunger, thirst, sexual desire, escape and attack is possibly similar across many animals. What does differ is the behaviour and representations triggered in response to those needs, as well as the top down triggering (e.g. what makes a human being fear abstract nouns).

Lower Brain Structure – Cancer Research UK / Wikimedia Commons

To quote from this article on cortical evolution:

Comparative studies of brain structure and development have revealed a general bauplan that describes the fundamental large-scale architecture of the vertebrate brain and provides insight into its basic functional organization. The telencephalon not only integrates and stores multimodal information but is also the higher center of action selection and motor control (basal ganglia). The hypothalamus is a conserved area controlling homeostasis and behaviors essential for survival, such as feeding and reproduction. Furthermore, in all vertebrates, behavioral states are controlled by common brainstem neuromodulatory circuits, such as the serotoneric system. Finally, vertebrates harbor a diverse set of sense organs, and their brains share pathways for processing incoming sensory inputs. For example, in all vertebrates, visual information from the retina is relayed and processed to the pallium through the tectum and the thalamus, whereas olfactory input from the nose first reaches the olfactory bulb (OB) and then the pallium.

Needs” near the middle or even the top of Maslow’s pyramid are generally mammalian needs. These include love, companionship, acceptance and social standing. Consensus is forming that nature hijacked parental bonds, especially those that arise from and encourage breast feeding, to build societies. An interesting question is does this require the increase in cortical complexity that is seen in mammals? These “needs” mainly arise from the structures that surround the thalamus and basal ganglia, as well as mediators such as oxytocin. So that pyramid does actually have a vague neural correlate; we build our social lives on top of a background of other more essential drives.

1511_The_Limbic_Lobe
Illustration from Anatomy & Physiology, Connexions Web site. http://cnx.org/content/col11496/1.6/, Jun 19, 2013.

The top of Maslow’s pyramid is contentious. What the hell is self-actualisation? Being the best you you can be? What does that mean? The realisation of talents and potentialities? What if my talent is organising people to commit genocide? Rants aside, Wikipedia gives us something like:

Expressing one’s creativity, quest for spiritual enlightenment, pursuit of knowledge, and the desire to give to and/or positively transform society are examples of self-actualization.

What these seem to be are human qualities that are generally not shared with other animals. Creativity, spirituality, knowledge and morality are all enabled by the more developed cortical areas found in human beings, as coordinated by the frontal lobes, where these cortical areas feed back to both the mammalian and lower brain structures.

A person may thus be likened to a song. The beat and bass provided by the lower brain structures, lead guitar and vocals by the mammalian structures, and the song itself (in terms of how these are combined in time) by the enlarged cortex.

Back to Needs

We can now understand some of the problems that arise when Professor Boden refers to “needs“. Human “needs” arise at a variety of levels, where higher levels are interconnected with and feed back to lower levels. Hence, you can take about “needs” such as hunger relatively independently of social needs, but social needs only arise in systems that experience hunger. There is thus a question of whether we can talk about social needs independent of lower needs.

We can also see how the answer to the question: “can robots ever have needs?” ignores this hierarchy. It is easier to see how a robot could experience a “need” equivalent to hunger than it is to see it experience a “need” equivalent to acceptance within a social group. It is extremely difficult to see how we could have a “self-actualised” robot.

Environment

Before we look at whether robots care we also need to introduce “the environment“. Not even human beings have “needs” in isolation. Indeed, a “need” implies something is missing, if an environment fulfils the requirement of a need, is it still a “need“?

Additionally, behaviour that is not suited to an environment would fall outside most lay definitions of “intelligence“. “Intelligence” is thus to a certain extent a modelling of the world that enables environmental adaptation.

aerial photo of mountain surrounded by fog
Photo by icon0.com on Pexels.com

The environment comes into play in two areas: 1) human “needs” have evolved within a particular “environment“; and 2) a “need” is often expressed as behaviour that obtains a requirement from the “environment” that is not immediately present.

Food, water, a reasonable temperature range (10 to 40 degrees Celsius), and an absence of harmful substances are fairly fundamental for most life; but these are actually a mirror image of the physical reality in which life on Earth evolved. If our planet had an ambient temperature of 50 to 100 degrees Celsius, would we require warmth? Can non-hydrogen-based life exist without water? Could you feed off cosmic rays?

These are not ancillary points. If we do create complex information processing devices that act in the world, where behaviour is statistical and environment-dependent, would their low-level needs over with ours? At presence it appears that a source of electrical power is a fairly fundamental “robot” or “AI” need. If that electrical power is generated from urine , do we have a “need” for power or for urine? If urine is correlated with over-indulging on cider at a festival, does the “AI” have a “need” for inebriated festival goers?

The sensory environment of robots also differs from human beings. Animals share evolutionary pathways for sensory apparatus. We have similar neuronal structure to process smell, sight, sound, motor-feedback, touch and visceral sensations, at least at lower levels of processing complexity. In comparison, robots often have simple ultrasonic transceivers, infra-red signalling, cameras and microphones. Raw data is processed using a stack of libraries and drivers. What would evolution in this computing “environment” look like? Can robots evolve in this environment?

Do robots have “needs“?

So back to “robots“. It is easier to think about “robots” than “AI“, as they are embodied in a way that provides an implicit reference to the environment. “AI” in this sense may be used much as we use “brain” and “mind” (it being difficult with deep learning to separate software structure from function).

Do robots have “needs“? Possibly. Could robots have “needs“? Yes, fairly plausibly.

Given a device with a range of sensory apparatus, a range of actuators such as motors, and modern reinforcement learning algorithms (see here and here) you could build a fairly autonomous self-learning system.

The main problem would not be “needs” but “goals“. All the reinforcement learning algorithms I am aware of require an explicit representation of “good“, normally in the form of a “score“. What is missing is a mapping between the environment and the inner state of the “AI“. This is similar to the old delineation between supervised and unsupervised learning. It doesn’t help that roboticists skilled at representing physical hardware states tend to be mechanical engineers, whereas AI researchers tend to be software engineers. It requires a mirroring of the current approach, so that we can remove scores altogether (this is an aim of “inverse reinforcement learning“). While this appears to be a lacuna in most major research efforts, it does not appear insurmountable. I think the right way to go is for more AI researchers to build physical robots. Physical robots are hard.

Do robots care?

Do most “robots” as currently constructed “care“? I’d agree with Professor Boden and say: No.

accident black and white care catastrophe
Photo by Snapwire on Pexels.com

Care” suggests a level of social processing that the majority of robot and AI implementations currently lack. Being the self-regarding species that we are, most “social” robots as currently discussed refer to robots that are designed to interact with human beings. Expecting this to naturally result in some form of social awareness or behaviour is nonsensical: it is similar to asking why flies don’t care about dogs. One reason human beings are successful at being social is we have a fairly sophisticated model of human beings to go on: ourselves. This model isn’t exact (or even accurate), and is largely implemented below our conscious awareness. But it is one up from the robots.

A better question is possibly: do ants care? I don’t know the answer. One one hand: No, it is difficult to locate compassion or sympathy within an ant. On the other hand: Yes, they have complex societies where different ants take on different roles, and they often act in a way that benefits those societies, even to the detriment of themselves. Similarly, it is easier to design a swarm of social robots that could be argued to “care” about each other than it is to design a robot that “cares” about a human being.

Also, I would hazard to guess that a caring robot would first need to have some form of autonomy; it would need to “care” about itself first. An ant that cannot acquire its own food and water is not an ant that can help in the colony.

Could future “robots” “care“? Yes – I’d argue that it is not impossible. It would likely require a complex representation of human social needs but maybe not the complete range of higher human capabilities. There would always be the question of: does the robot *truly* care? But then this question can be raised of any human being. It is also a fairly pertinent question for psychopaths.

Getting Practical

Despite the hype, I agree with Professor Boden that we are a long way away from any “robot” and “AI” operating in a way that is seen as nearing human. Much of the recent deep learning success involve models that appear cortical, we seem to have ignored the mammalian areas and the lower brain structures. In effect, our rationality is trying to build perfectly rational machines. But because they skip the lower levels that tie things together, and ignore the submerged subconscious processes that mainly drive us, they fall short. If “needs” are seen as an expression of these lower structures and processes, then Professor Boden is right that we are not producing “robots” with “needs“.

As explained above though, I don’t think creating robots with “needs” is impossible. There may even be some research projects where this is the case. We do face the problem that so far we are coming at things backwards, from the top-down instead of the bottom-up. Using neural network architectures to generate representations of low-level internal states is a first step. This may be battery levels, voltages, currents, processor cycles, memory usage, and other sensor signals. We may need to evolve structural frameworks in simulated space and then build upon these. The results will only work if they are messy.

Quick Post – Machine Readable Patents Act

I’ve finally found out how to access UK legislation in XML format – http://www.legislation.gov.uk/developer/uris – you just add /data.xml to the end of the statute URI!

E.g. – https://www.legislation.gov.uk/ukpga/1977/37/data.xml .

If anyone wants to play with the legislation you can use the requests and Beautiful Soup libraries in Python to parse the XML. If you want a bit more power you can use lxml.

Patent Search as a Deep Learning Problem

This article will look into how the process of obtaining a patent could be automated using deep learning approaches. A possible pipeline for processing a patent application will be discussed. It will be shown how current state of the art natural language processing techniques could be applied.

Brief Overview of Patent Prosecution

First, let’s briefly look at how a patent is obtained. A patent application is filed. The patent application includes a detailed description of the invention, a set of figures, and a set of patent claims. The patent claims define the proposed legal scope of protection. A patent application is searched and examined by a patent office. Relevant documents are located and cited against the patent application. If an applicant can show that their claimed invention is different from each citation, and that any differences are also not obvious over the group of citations, then they can obtain a granted patent. Often, patent claims will be amended by adding extra features to clearly show a difference over the citations.

Patent Data

For a deep learning practitioner the first question is always: what data do I have? If you are lucky enough to have labelled datasets then you can look at applying supervised learning approaches.

It turns out that the large public database of patent publications is such a dataset. All patent applications needs to be published to continue to grant. This will be seen as a serendipitous gift for future generations.

Search Process

In particular, a patent search report can be thought of as the following processes:

img_0179

A patent searched locates a set of citations based on the language of a particular claim.

img_0178

Each located citation is labelled as being in one of three categories:

– X: relevant to the novelty of the patent claim.
– Y: relevant to the inventive step of the patent claim. (This typically means the citation is relevant in combination with another Y citation.)
– A: relevant to the background of the patent claim. (These documents are typically not cited in an examination report.)

In reality, these two processes often occur together. For our ends, we may wish to add a further category: N – not cited.

Problem Definition

Thinking as a data scientist, we have the following data records:

(Claim text, citation detailed description text, search classification)

This data may be retrieved (for free) from public patent databases. This may need some intelligent data wrangling. The first process may be subsumed into the second process by adding the “not cited” category. If we move to a slightly more mathematical notation, we have as data:

(c, d, s)

Where c and d are based on a (long) string of text and s is a label with 4 possible values. We then want to construct a model for:

P(s | c, d)

I.e. a probability model for the search classifications given the claim text and citation detailed description. If we have this we can do many cool things. For example, for a set c, we can iterate over a set of d and select the documents with the highest X and Y probabilities.

Representations for c and d

Machine learning algorithms operate on real-valued tensors (n*m -dimensional arrays). more than that, the framework for many discriminative models maps data in the form of a large tensor X to a set of labels in the form of a tensor Y. For example, each row in X and Y may relate to a different data sample. The question then becomes how do we map (c, d, s) to (X, Y)?

Mapping s to Y is relatively easy. Each row of Y may be an integer value corresponding to one of the four labels (e.g. 0 to 3). In some cases, each row may need to represent the integer label as a “one hot” encoding, e.g. a value of [2] > [0, 0, 1, 0].

Mapping c and d to X is harder. There are two sub-problems: 1) how do we combine c and d? and 2) how do we represent each of c and d as sets of real numbers?

There is an emerging consensus on sub-problem 2). A great explanation may be found in Matthew Honnibal’s post Embed, Encode, Attend, Predict. Briefly summarised, we embed words from the text using a word embedding (e.g. based on Word2Vec or GloVe). This outputs a sequence of real-valued float vectors for each word (e.g. vectors of length ~300). We then encode this sequence of vector into a document matrix, e.g. where each row of the matrix represents a sentence encoding. One common way to do this is to apply a bidirectional recurrent neural network (RNN – such as an LSTM or GRU), where outputs of a forward and backward network are concatenated. An attention mechanism is then applied to reduce the matrix to a vector. The vector then represents the document.

img_0180

A simple way to address sub-problem 1) is to simply concatenate c and d (in a similar manner to the forward and backward passes of the RNN). A more advanced approach might use c as an input to the attention mechanism for the generation of the document vector for d.

Obtain the Data

To get our initial data records – (Claim text, citation detailed description text, search classification) – we have several options. For a list of patent publications, we can obtain details of citation numbers and search classifications using the European Patent Office’s Open Patent Services RESTful API. We can also obtain a claim 1 for each publication. We can then use the citation numbers to look up the detailed descriptions, either using another call to the OPS API or using the USPTO bulk downloads.

I haven’t looked in detail at the USPTO examination datasets but the information may be available there as well. I know that the citations are listed in the XML for a US grant (but without the search classifications). Most International (PCT / WO) publications include the search report, so as a push you could OCR and regex the search report text to extract a (claim number, citation number, search category) tuple.

Training

Once you have a dataset consisting of X and Y from c, d, s, the process then just becomes designing, training and evaluating different deep learning architectures. You can start with a simple feed forward network and work up in complexity.

I cannot guarantee your results will be great or useful, but hey if you don’t try you will never know!

What are you waiting for?

Can you protect Artificial Intelligence inventions at the European Patent Office?

In recent years there has been a resurgence of interest in machine learning and so-called “artificial intelligence” systems. Much of this resurgence is based on advances in so-called “deep learning”, neural networks with multiple layers of connections. For example, convolutional neural networks now provide state-of-the-art performance in many image recognition tasks and recurrent neural networks have been used to increase the accuracy of many commercial machine translation systems. Machine learning may be considered a subdiscipline of “artificial intelligence” that deals with algorithms that are trained to perform tasks such as classification based on collections of data. This recent resurgence has meant that more companies wish to protect innovations in this field. This quickly brings them into the realm of computer-implemented inventions, and the nuances of protection at the European Patent Office.

A881388C-9C65-470D-AED6-9C584435DA4A
Obligatory “Terminator” Patent Attorney Stock Image

Computer-Implemented Inventions

“Computer-implemented invention” is the European Patent Office term for a software invention. Claims that specify machine learning and artificial intelligence systems are almost certainly to be considered “computer-implemented inventions”. The innovation in such systems occurs in the design of the algorithms and/or software architectures. Claims for new hardware to implement machine learning and artificial intelligence systems, such as new graphical processing unit configurations, would not be classed as computer-implemented inventions and would be considered in the same manner as conventional computer devices.


What Do We Have To Go On?

As key advances in the field have only been seen since 2010, there are few Board of Appeal cases that explicitly consider these inventions. It is likely we will see many Board of Appeal decisions in this field, but it is unlikely these will filter through the system much before 2020. However, applications in the field are being filed and examined. The following review is based on knowledge of these applications, evaluated in the context of existing Board of Appeal cases.


Prior Art

A first issue regarding machine learning and artificial intelligence systems is that many of the underlying techniques are public knowledge, given the rapid turn-over of publications and repositories of electronic pre-prints such as arXiv. Hence, many applicants may face novelty and inventive step objections if the invention involves the application of known techniques to new domains or problems. For patent attorneys who are drafting new applications, it is recommended to perform a pre-filing search of such publication sources and ensure that the inventors provide a full appraisal of what is public knowledge.


Domain of Invention

A second issue is the domain of the invention. This may be seen as the context of the invention as presented in the claims and patent description.

Inventions that apply machine learning approaches to fields in engineering are generally considered more positively by the European Patent Office. These fields will typically either operate on low-level data that represents physical properties or have some form of actuation or change in the physical world. For example, the following domains are less likely to have features excluded from an inventive step evaluation for being “non-technical”: navigating a robot within a three-dimensional space; dynamic adaptive change of a Field Programmable Gate Array; audio signal analysis in speech processing; and controlling a power supply to a data centre.

On the other hand, inventions that apply machine learning approaches within a business or “enterprise” domain are likely to be analysed more closely. These inventions have a greater chance of claim features being excluded for being “non-technical”. These domains typically have an aim of increasing profit. The more this aim is explicit in the patent application, the more likely a “non-technical” objection will be raised. For example, the following inventions are more likely to have features excluded from an inventive step evaluation for being “non-technical”: intelligent organisation of playlists in a music streaming service; adaptive electronic trading of securities; automated provision of electronic information in a company hierarchy; and automated negotiation of online advertising auctions.


Exclusions from Patentability

A third issue that arises is that individual features of the claims fall within the exclusions of Article 52(2) EPC.  In the field of machine learning and artificial intelligence systems, there is an increased risk of claim features being considered to fall into one of the following categories: mathematical methods; schemes, rules and methods for performing mental acts or doing business; and presentations of information. These will briefly be considered in turn below.

Mathematical Methods

The field of machine learning is closely linked to the field of statistics. Indeed many machine learning algorithms are an application of statistical methods. Academic researchers in the field are trained to describe their contributions mathematically, and this is required for publication in an academic journal. However, the practice of the European Patent Office, as directed by the Boards of Appeal, typically regards statistical methods as mathematical methods. In their pure, unapplied form they are considered “non-technical”.

Schemes, Rules and Methods for Performing Mental Acts

A claim feature is likely to be considered part of schemes, rules and methods for performing mental acts when the scope of the feature is too broad or abstract. For example, if a claimed method step also covers a human being performing the step manually, it is likely that the scope is too broad.

Schemes, Rules and Methods for Doing Business

Claim features are likely to be considered schemes, rules and methods for doing business when the information processing relates to a business aim or goal. This is especially the case where the information processing is dependent on the content of the data being processed, and that content does not relate to a low-level recording or capture of a physical phenomenon.

For example, processing of a digital sound recording to clean the recording of noise would be considered “technical”; processing row entries in a database of information technology assets to remove duplicates for licensing purposes would likely be considered “non-technical”.

Presentation of Information

Objections that features relate to the presentation of information may occur when the innovation relates to user experience (UX) or user interface (UI) features.

For example, a machine learning algorithm that adaptively arranges icons on a smartphone according to use may receive objections on the grounds that features relate to mathematical methods (the algorithm) and presentation of information (the arrangement of icons on the graphical user interface). As per Guideline G-II, 3.7.1, grant is unlikely if information is simply displayed to a user and any improvement occurs in the mind of the user. However, it is possible to argue for a technical effect if the output provides information on an internal state of operation of a device (at the operating system level or below, e.g. battery level, processing unit utility etc.) or if the output improves a sequence of interactions with a user (e.g. provides a new way of operating a device). Again, a technical problem needs to be demonstrated and the machine learning algorithm needs to be a tool to solve this problem.


Subfields of ML and AI

In certain subfields of machine learning and artificial intelligence, there is a tendency for Boards of Appeal and Examining Divisions to consider inventions more or less “technical”. This is often for a combination of factors, including field of operation of appellants, the history of research and traditional applications, and the background and public policy preferences of staff of the European Patent Office.

For example, machine learning and artificial intelligence systems in the field of image, video and audio processing are more likely to be found to have “technical” features that can contribute to an inventive step under Article 56 EPC. A convolutional neural network architecture applied to image processing is more likely to be considered a “technical” contribution that the same architecture applied to text processing. Similarly, it may be argued that machine learning and artificial intelligence systems in the field of medicine and biochemistry have “technical” characteristics, e.g. if they operate on data originating from mass spectrometry or medical imaging.

However, advances in search, classification and natural language processing are more likely to be found to have “non-technical” features that cannot contribute to an inventive step under Article 56 EPC. These areas of machine learning and artificial intelligence systems are often felt to be “technical” by the engineers and developers building such systems. However, it is a nuance of European case law that these areas are often deemed to have claim features that fall into an excluded “business”, “mathematical” or “administrative” category.

A recent example may be found in case T 1358/09. The claim in this case comprised “text documents, which are digitally represented in a computer, by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document”. The Board agreed with the appellant that the steps in the claim were different to those applied by a human being performing classification. However, the Board concluded that the algorithm underlying the method the claim did not “go beyond a particular mathematical formulation of the task of classifying documents”. They were of the opinion that the skilled person would have been given the (“non-technical”) text classification algorithm and simply be tasked with implementing it on a computer.


What Should We Not Do?

Managers and executives of commercial enterprises are often habituated into selling innovations to a non-technical audience. This means that invention disclosures often describe the invention at an abstract “marketing” level. When an invention is described in a patent application at this level, inventive step objections are likely.

The fact that mathematical formulae may comprise excluded “non-technical” features is difficult for inventors and practitioners to grasp. Often equations at an academic-publication level are included in patent specifications in an attempt to add technical character. This often backfires. While such equations may be deemed “technical” according to a standard definition of the term, they are often not deemed “technical” according to the definition applied by European case law.

In general, objections are more likely in this area when the scope of the claim is broad and attempts to cover applications of a particular algorithm in all industries. Applicants should be advised that trying to cover everything will likely lead to refusal.


What Should We Do?

Chances of grant may be increased by ensuring an examiner or Board of Appeal member can clearly see the practical application of the algorithm to a specific field or low-level technical area.

Patent attorneys drafting patent applications for machine learning and artificial intelligence systems should carefully consider the framing and description of the invention in the patent specification. In-depth discussions with the engineers and developers that are implementing the systems often enable innovations to be described more precisely. Given this precision, innovations may be framed as a “technical” or engineering innovation, i.e. a technical solution to a technical problem. This increases the chance of a positive opinion from the European Patent Office.

Often features of an invention will have both a business advantage and a “technical” advantage. For example, a machine learning system that learns how to dynamically route data over a network may help an online merchant more successfully route traffic to their website; however, this improved method may involve manipulation of data packets within a router that also improves network security. A patent specification describing the latter advantage will have a greater chance of grant than the former, regardless of the actual provenance of the invention. A practitioner may work with an inventor to ensure that initial business advantages are distilled to their proximate “technical” advantages and effects. For cases where data does not relate to a low-level recording or capture of a physical phenomenon, it is recommended to ensure that any described technical effect applies regardless of the content of the data.

When considering exclusion for “mental acts”, a risk of a “non-technical” objection may be reduced by ensuring that your method steps exclude a manual implementation. Note that this exclusion does not necessarily prevent other objections being raised (see T 1358/09 above).

When drafting patent applications,  it is also important to describe the implementation of any mathematical method. In this manner, pseudo-code is often more useful than equations. It is also important to clearly define how attributes of the physical world are represented within the computer. Good questions to ask include: “What data structures and function routines are used to implement the elements of any equation?”, “How is data initially recorded, e.g. are documents a scanned image such as a bitmap or a markup file using a Unicode encoding?”,  “What programming languages and libraries are being used?”, or “What application programming interfaces are important?”.

Practitioners do need to be concerned with including overly limiting definitions within the claims; however, a positive opinion is more likely when specific implementation examples are described in the patent specification, followed by possible generalisations, than when specific implementation examples are omitted and the description only presents a generalised form of the invention along with more detailed mathematical equations.

To be successful in search, classification and natural language processing,  one approach is to determine whether features relating to a non-obvious technical implementation may be claimed. This approach often goes hand in hand with a knowledge of academic publications in the field. While such publications may disclose a version of an algorithm being used, they often gloss over the technical implementation (unless the underlying source code is released on GitHub). For example, is there any feature of the data, ignoring its content, which makes implementation of a given equation problematic? If inventors have managed to reduce the dimensionality of a neural network using clever string pre-processing or quantisation then there may be an argument that the resultant solution is implementable on mobile and embedded devices. Reducing a size of a model from 3 GB to 300 KB by intelligent selection of pipeline stages may enable you to argue for a technical effect.


Do Not Believe The Hype?

Despite the hype, machine learning and artificial intelligence systems are just another form of software solution. As such, all the general guidance and case law on computer-implemented inventions continues to apply. A benefit of the longer timescales of patent prosecution is that you ride out the waves of Gartner’s hype cycle. In fact, I still sometimes prosecute cases from the end of the dotcom boom…

Your Patent Department in 2030

Natural Language Processing and Deep Learning have the potential to overhaul patent operations for large patent departments. Jobs that used to cost hundreds of dollars / pounds per hour may cost cents / pence. This post looks at where I would be investing research funds.

The Path to Automation

In law, the path to automation is typically as follows:

Qualified Legal Professional > Associate > Paralegal > Outsourcing > Automation

Work is standardised and commoditised as we move down the chain. Today we will be looking at the last stage in the chain: automation.

virtual-reality-1802469_640
[Insert generic public domain image of future.]

Potential Applications

At a high level, here are some potential applications of deep learning models that have been trained on a large body of patent publications:

  • Invention Disclosure > Patent Specification +/ Claims (Drafting)
  • Patent Claims + Citation > Amended Claims (Amendment)
  • Patent Claims > Corpus > Citations (Patent Search)
  • Invention Disclosure > Citations (Patent Search)
  • Patent Specification + Claims > Cleaned Patent Specification + Claims (Proof Reading)
  • Figures > Patent Description (Drafting)
  • Claims > Figures +/ Patent Description (Drafting)
  • Product Description (e.g. Manual / Website) > Citation (Infringement)
  • Group of Patent Documents > Summary Clusters (Text or Image) (Landscaping)
  • Official Communication > Response Letter Text (Prosecution)

Caveat

I know there is a lot of hype out there and I don’t particularly want to be responsible for pouring oil on the flames of ignorance.  I have tried to base these thoughts on widely reviewed research papers. The aim is to provide more a piece of informed science fiction and to act as a guide as to what may be. (I did originally call it “Your Patent Department 2020” :).

Many of these things discussed below are still a long way off, and will require a lot of hard work. However, the same was said 10 years ago of many amazing technologies we now have in production (such as facial tagging, machine translation, virtual assistants, etc.).

Examples

Let’s dive into some examples.

Search

At the moment, patent drafting typically starts as follows: receive invention disclosure, commission search (in-house or external), receive search results, review by attorney, commission patent draft. This can take weeks.

Instead, imagine a world where your inventors submit an invention disclosure and within minutes or hours you receive a report that tells you the most relevant existing patent publication, highlights potentially novel and inventive features and tells you whether you should proceed with drafting or not.

The techniques already exist to do this. You can download all US patent publications onto a hard disk that costs $75. You can convert high-dimensionality documents into lower-dimensionality real vectors (see https://radimrehurek.com/gensim/wiki.html or https://explosion.ai/blog/deep-learning-formula-nlp). You can then compute distance metrics between your decomposed invention disclosure and the corpus of US patent publications. Results can be ranked. You can use a Long Short Term Memory (LSTM) decoder (see https://www.tensorflow.org/tutorials/seq2seq) on any difference vector to indicate novel and possibly inventive features. A neural network classifier trained on previous drafting decisions can provide a probability of proceeding based on the difference results.

Drafting

A draft patent application in a complicated field such as computing or electronics may take a qualified patent attorney 20 hours to complete (including iterations with inventors). This process can take 4-6 weeks.

Now imagine a world where you can generate draft independent claims from your invention disclosure and cited prior art at the click of a button. This is not pie-in-the-sky science fiction. State of the art systems that combine natural language processing, reinforcement learning and deep learning can already generate fairly fluid document summaries (see https://metamind.io/research/your-tldr-by-an-ai-a-deep-reinforced-model-for-abstractive-summarization). Seeding a summary based on located prior art, and the difference vector discussed above, would generate a short set of text with similar language to that art. Even if the process wasn’t able to generate a perfect claim off the bat, it could provide a rough first draft to an attorney who could quickly iterate a much improved version. The system could learn from this iteration (https://deepmind.com/blog/learning-through-human-feedback/) allowing it to improve over time.

Or another option: how about your patent figures are generated automatically based on your patent claims and then your detailed description is generated automatically based on your figures and the invention disclosure? Prototype systems already exist that perform both tasks (see https://arxiv.org/pdf/1605.05396.pdf and http://cs.stanford.edu/people/karpathy/deepimagesent/).

Prosecution

In the old days, patent prosecution involved receiving a letter from the patent office and a bundle of printed citations. These would be processed, stamped, filed, carried around on an internal mail wagon and placed on a desk. More letters would be written culminating in, say, a written response and a set of amendments.

From this, imagine that your patent office post is received electronically, then automatically filed and docketed. Citations are also automatically retrieved and filed. Objection categories are extracted automatically from the text of the office action and the office action is categorised with a percentage indicating the chance of obtaining a granted patent. Additionally, the text of the citations is read and a score is generated indicating whether the citations remove novelty from your current claims (this is similar to the search process described above, only this time you know what documents you are comparing). If the score is lower than a given threshold, a set of amendment options are presented, along with a percentage chances of success. You select an option, maybe iterate the amendment, and then the system generates your response letter. This includes inserting details of the office action you are replying to (specifically addressing each objection that is raised), automatically generating passages indicating basis in the text of your application, explains the novel features, generates a problem-solution that has a basis in the text of your application, and provides pointers for why the novel features are not obvious. Again you iterate then file online.

Parts of this are already in place at major law firms (e.g. electronically filing and docketing). I have played with systems that can extract the text from an office action PDF and automatically retrieve and file documents via our document management application programming interface. With a set of labelled training data, it is easy to build an objection classification system that takes as input a simple bag of words. Companies such as Lex Machina (see https://lexmachina.com/) already crunch legal data to provide chances of litigation success; parsing legal data from say the USPTO and EPO would enable you to build a classification system that maps the full text of your application, and bibliographic data, to a chance of prosecution success based on historic trends (e.g. in your field since the 1970s). Vector-space representations of documents allow distance measures in n-dimensional space to be calculated, and decoder systems can translate these into the language of your specification. The lecture here explains how to create a question answering system using natural language processing and deep learning (http://media.podcasts.ox.ac.uk/comlab/deep_learning_NLP/2017-01_deep_NLP_11_question_answering.mp4). You could adapt this to generate technical problems based on document text, where the answer is bound to the vector-space distance metric. Indeed, patent claim space is relatively restricted (it is, at heart, a long sentence, where amendments are often additional sub-phrases of the sentence that are consistent with the language of the claimset); the nature of patent prosecution and added subject matter, naturally produces a closed-form style problem.

 

Imagining Reality is the First Stage to Getting There

There is no doubt that some of these scenarios will be devilishly hard to implement. It took nearly two decades to go from paper to properly online filing systems. However, prototypes of some of these solutions could be hacked up in a few months using existing technology. The low hanging fruit alone offers the potential to shave hundreds of thousands of dollars from patent prosecution budgets.

I also hope that others are aiming to get there too. If you are please get in touch!

Modelling Claim Language

Playing around with natural language processing has given me the confidence to attempt some claim language modelling. This may be used as a claim drafting tool or to process patent publication data. Here is a short post describing the work in progress.

Block font

Background Reading:

Here, a caveat: this modelling will be imperfect. There will be claims that cannot be modelled. However, our aim is not a “perfect” model but a model whose utility outweighs its failings. For example, a model may be used to present suggestions to a human being. If useful output is provided 70% of the time, then this may prove beneficial to the user.

To start we will keep it simple. We will look at system or apparatus claims. As an example we can take Square’s payment dongle:

1. A decoding system, comprising:

a decoding engine running on a mobile device, the decoding engine in operation decoding signals produced from a read of a buyer’s financial transaction card, the decoding engine in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state, detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state, identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits;
and
a transaction engine running on the mobile device and coupled to the decoding engine, the transaction engine in operation receiving as its input decoded buyer’s financial transaction card information from the decoding engine and serving as an intermediary between the buyer and a merchant, so that the buyer does not have to share his/her financial transaction card information with the merchant.

Let’s say a claim consists of “entities”. These are roughly the subjects of claim clauses, i.e. the things in our claim. They may appear as noun phrases, where the head word of the phrase is modelled as the core “entity”. They may be thought of as “objects” from an object-oriented perspective, or “nodes” in a graph-based approach.

In the above claim, we have core entities of:
  • “a decoding system”
  • “a decoding engine”
  • “a transaction engine”

An entity may have “properties” (i.e. “is” something) or may have other entities (i.e. “have” something).

In our example, the “decoding system” has the “decoding engine” and the “transaction engine” as child entities. Or put another way, the “decoding engine” and the “transaction engine” have the “decoding system” as a parent entity.

In the example, the properties of the entities are more complex. The “decoding system” does not have any. It just has the child entities. The “decoding engine” “is”:
    • running on a mobile device”
    • in operation decoding signals produced from a read of a buyer’s financial transaction card”
    • in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state”
    • “detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state”
    • “identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits”
 
In these “is” properties, we have a number of implicit entities. These are not in our claim but are referred to by the claim. They are basically the other nouns in our claim. They include:
    • “mobile device”
    • “read”
    • “buyer’s financial transaction card”
    • “signals”
    • “peaks”
    • “bits”

[When modelling the part of speech tagger is mostly there but probably required human tweaking and confirmation.]

Mapping to Natural Language Processing

To extract noun phrases, we need the following processing pipeline:

claim_text > [1. Word Tokenisation] > list_of_words > [2. Part of Speech Tagging] > labelled_words > [3. Chunking] > tree_of_noun_phrases

Now, the NLTK toolkit provides default functions for 1) and 2). For 3) we have the options of a RegExParser, for which we need to supply noun phrase patterns, or Classifier-based chunkers. Both need a little extra work but there are tutorials on the Net.

Noun phrases should be used consistently throughout claim sentences – this can be used to resolve ambiguity.

Resources for (Legal) Deep Learning

This post sets out a number of resources to get you started with deep learning, with a focus on natural language processing for legal applications.

A Bit of Background

Deep learning is a bit of a buzz word. Basically, it relates to recent advances in neural networks. In particular, it relates to the number of layers that can be used in these networks. Each layer can be thought of as a mathematical operation. In many cases, it involves a multidimensional extension of drawing a line, y = ax + b, to separate a space into multiple parts.

I find it strange that when I studied machine learning in 2003/4, neural networks had gone out of fashion. The craze then was for support vector machines. Neural networks were seen as a bit of a dead end. While there was nothing wrong theoretically, in practice  it wasn’t possible to train a network with more than a couple of layers. This limited their application.

What changed?

Computers and software improved. Memory increased. Researchers realised they could co-opt the graphical processing units of beefy graphics cards of hardcore gamers to perform matrix and vector multiplication. The Internet improved access to large scale data sets and enabled the fast propagation of results. Software tool kits and standard libraries arrived. You could now program in Python for free rather than pay large licence fees for Matlab. Python made it easy to combine functionality from many different areas. Software became good at differentiating and incorporating advanced mathematic optimisation techniques. Google and Facebook poured money into the field. Etc.

This all led to researchers being able to build neural networks with more and more layers that could be trained efficiently. Hence, “deep” means more than two layers and “learning” refers to neural network approaches.

Deep Natural Language Processing

Deep learning has a number of different application areas. One big split is between image processing and natural language processing. The former has seen big success with the use of convolutional neural networks (CNNs), while natural language processing has tended to focus on recurrent neural networks (RNNs), which operate on sequences within time.

Image processing has also typically considered supervised learning problems. These are problems where you have a corpus of labelled data (e.g. ‘ImageX’ – ‘cat’) and you want a neural network to learn the classifications.

Natural language processing on the other hand tends to work with unsupervised learning problems. In this case, we have a large body of unlabelled data (see the data sources below) and we want to build models that provide some understanding of the data, e.g. that model in some way syntactic or semantic properties of text.

Saying this there are cross overs – there are several highly-cited papers that apply CNNs to sentence structures, and document classification can be performed on the basis of a corpus of labelled documents.

Introductory Blog Posts

A good place to start are these blog posts and tutorials. I’m rather envious of the ability of these folks to write so clearly about such a complex topic.

Courses

After you’ve read those blog articles a next step is to dive into the Udacity free Deep Learning course. This is taught in collaboration with Google Brain and is a great introduction to Logical Regression, Neural Networks, Data Wrangling, CNNs and a form of RNNs called Long Short Term Memory (LSTMs). It includes a number of interactive Jupyter/IPython Notebooks, which follow a similar path to the Tensorflow tutorials.

Udacity Deep Learning Course – https://www.udacity.com/course/deep-learning–ud730

 Their Data Science, Github, Programming and Web Development courses are also very good if you need to get quickly up to speed.

Once you’ve completed that, a next step is  working through the lecture notes and exercises for these Stanford and Oxford courses.

 Stanford Deep Learning for Natural Language Processing – http://cs224d.stanford.edu/syllabus.html

Oxford Deep NLP (with special guests from Deepmind & Nvidia) – https://github.com/oxford-cs-deepnlp-2017/lectures

Data Sources

Once you’ve got your head around the theory, and have played around with some simple examples, the next step is to get building on some legal data. Here’s a selection of useful text sources with a patent slant:

USPTO bulk data – https://bulkdata.uspto.gov/ – download all the patents!

Some of this data will require cleaning / sorting / wrangling to access the text. There is an (experimental) USPTO project in Java to help with this. This can be found here: https://github.com/USPTO/PatentPublicData . I have also been working on some Python wrappers to access the XML in (zipped) situ – https://github.com/benhoyle/patentdata and https://github.com/benhoyle/patentmodels.

Wikipedia bulk data – https://dumps.wikimedia.org/enwiki/latest/ – download all the knowledge!

The file you probably want here is enwiki-latest-pages-articles.xml.bz2. This clocks in at 13 GB compressed and ~58 GB uncompressed. It is supplied as a single XML file. Again I need to work on some Python helper functions to access the XML and return text.

 (Note: this is the same format as recent USPTO grant data – a good XML parser that doesn’t read the whole file into memory would be useful.)

WordNet.

The easiest way to access this data is probably via the NLTK toolkit indicated below. However, you can download the data for WordNet 3 here – https://wordnet.princeton.edu/wordnet/download/current-version/.

Bailli – http://www.bailii.org/ – a free online database of British and Irish case law & legislation, European Union case law, Law Commission reports, and other law-related British and Irish material.

There is no bulk download option for this data – it is accessed as a series of HTML pages. It would not be too difficult to build a Python tool to bulk download various datasets.

UK Legislation – Legislation.gov.uk.

 This data is available via a web interface. Unfortunately, there does not appear to be a bulk download option or an API for supplying machine readable data.

On the to-do list is a Python wrapper for supplying structured or unstructured versions of UK legislation from this site (e.g. possibly downloading with requests then parsing the returned HTML).

European Patent Office Board of Appeal Case Law database – https://www.epo.org/law-practice/case-law-appeals/advanced-search.html.

Although there is no API or bulk download option as of yet, it is possible to set up an RSS feed link based on search parameters. This RSS feed link can be processed to access links to each decision page. These pages can then be accessed and converted into text using a few Python functions (I have some scripts to do this I will share soon).

UK Intellectual Patent Office Hearing Database – https://www.ipo.gov.uk/p-challenge-decision-results.htm.

Again a human accessible resource. However, the decisions are accessible by year in fairly easy to parse tables of data (I again have some scripts to do this that I will share with you soon).

Your Document / Case Management System.

Many law firms use some kind of document and/or case management system. If available online, there may be an API to access documents and data stored in these systems. Tools like Textract (see below) can be used to extract text from these documents. If available as some form of SQL database, you can often access the data using ODBC drivers.

Tools

Once you have some data the hard work begins. Ideally what you want is a nice text string per document or article. However, none of the data sources listed above enable you to access this easily. Hence, you need to start building some wrappers in Python to access and parse the data and return an output that can be easily processed by machine learning libraries. Here are some tools for doing this, and then to build your deep learning networks. For more details just Google the name.

NLTK

– brilliant for many natural language processing functions such as stemming, tokenisation, part of speech tagging and many more.

SpaCy

– an advanced set of NLP functions.

Gensim

– another brilliant library for processing big document libraries – particularly good for lazy functions that do not store all the data in memory.

Tensorflow

– for building your neural networks.

Keras

– a wrapper for Tensorflow or Theano that allows rapid prototyping.

Scikit-Learn

– provides implementations for most of the major machine learning techniques, such as Bayesian inference, clustering, regression and more.

Beautiful Soup

– great for easy parsing of semi-structured data such as websites (HTML) or patent documents (XML).

Textract

– a very simple wrapper over a number of different Linux libraries to extract text from a large variety of files.

Pandas

– think of this as a command line Excel, great for manipulating large lists of data.

Numpy

– numerical analysis in Python, used, amongst other things, for multidimensional arrays.

Jupyter Notebooks

– great for prototyping and research, the engineers squared paper notebook of the 21st century, plus they can be easily shared on GitHub.

Docker

– many modern toolkits require a bundle of libraries, it can be easier to setup a Docker   image (a form of virtualised container).

Flask

– for building web servers and APIs.

Now go build, share on GitHub and let me know what you come up with.

Patent Economics

Often you are faced with the question: should I patent my invention? A quick, back-of-the-envelope calculation can help with this decision.

calculator-385506_1280

CAVEAT: these are all roughly sketched out figures. This post is written in my spare time between cooking, cleaning, childcare and work. It does not constitute legal or financial advice. The figures are rough generalisations that allow you to work out whether it’s worth investigating further but may vary considerable for each individual case. Always get professional help with the details. 

Patenting Costs

Obtaining a patent is not a cheap process. As of 2017, my very rough rule-of-thumb is to budget £50k per country over the 20 year lifetime (excluding taxes – ~$75k).

This is based on, for a typical case:

  • ~£10k for initial work (e.g. searching), drafting an application and the costs of first (i.e. priority) filing.
  • ~£10k for developing strategy after an initial patent office search (e.g. UKIPO or in the International phase) and for filing an International patent application within a year of the first filing.
  • ~£5k per country to enter the national or regional phase after the end of the International phase for the International application. This is about right for a simple US and European entry; countries requiring translations may be up to £10k per country.
  • ~£15k per country for prosecution and grant. This is likely the most variable figure, with variance typically being on the upside (i.e. more expensive) if you are unlucky with prior art or a particular obstinate examiner.
  • ~£10k per country for renewal fees over 20 years. Again, this varies per country.

In terms of the distribution with time, this breaks down to:

  • ~£10k / year for first 3-4 years.
  • ~£0.5-1k / year for next 16-17 years.

Hence, most of the costs are front-loaded to the first 3-4 years: you need ~£30k over this period to properly take part in the patenting process.

Return on Investment

For a decent return, you want the patent’s value over its 20 year life to be at least 3x its cost (excluding inflation). Say this is £150k.

This works out as a real return of at least 5-6% per year over the lifetime of the patent.

The value of a patent is unlikely to be gained evenly over its lifetime. Statistics show that much of a patent’s value is realised towards the end of its life, e.g. 10-15 or 15-20 years post filing.

Anything less than this and your business would be better off just investing in the stock market.

accountant-1794122_1280

How to Determine Value

This is normally the hard part. However, there are a few short-cuts.

Patent Box

For a UK patent this may be an easy calculation.

Under the UK Patent Box scheme https://www.gov.uk/guidance/corporation-tax-the-patent-box, you can claim for a reduction in corporation tax (to 10%) for profits associated with a patented product or service.

Looking at the statistics for the period 1 April 2013 to 31 March 2014, we see that an average patent box claim was ~£500k, with the average claim for small businesses being £17k.

Most of the claims, understandably, were made by large companies. As such, the £500k / year average claim may include a number of different patented products. However, small businesses often only have one or two patents or products. Hence, the small business claim may be closer to a lower bound on yearly value per patent.

Of course, you can perform your own calculations. For a very rough upper bound on the benefit, simply add up the profits derived from each of your main products or services and multiply by 0.1. (This does assume you are making a profit.) For a lower bound, multiply this 10% saving by 0.5.

Now remember this is a yearly saving. The total saving will thus depend on the lifetime of your product.

Assuming a rough product lifetime of 10 years, and a lower bound on the tax claim of £15k / year, this means that an average UK patent provides a saving of £150k over its lifetime. This just happens to be the number we came up with above for a decent return.

From these rough calculations we see a couple of things:

  • To justify a UK patent’s value based on a Patent Box claim, you need to be making around £150k / year in profit for at least one product or service.
  • If this applies, a UK patent covering the product or service will pay for its costs and make a decent return.
  • Patenting can thus be economically justified in this case.

Licensing

Another way a patent can provide a return is through licensing. (Someone pays you for your permission to use the technology of the patent.)

Looking at our rough figures, you would need licence fees of ~£150k over 20 years, or approximately £7.5k / year.

Hence, if you feel that you can get one or more companies to pay £10k / year for the technology, patenting is worthwhile.

The recent case of Unwired Planet v Huawei https://www.judiciary.gov.uk/wp-content/uploads/2017/04/unwired-planet-v-huawei-20170405.pdf provides some useful information on industry licensing rates that can information these calculations.

In this case, an average worldwide FRAND licence rate for major markets for mobile equipment and infrastructure for a portfolio of 2G, 3G and 4G patents was deemed to be 0.05%. Now Unwired Planet have around 2,500 patents. Some googling indicates total infrastructure and handset sales to be around $150 billion (split 1:2). If everyone licensed at this rate, the annual licensing revenue would be $7.5 billion, divided by 2,500 patents gives you an average licensing income of $3 million (~£2 million) per patent per year.

Of course this is an upper-upper-bound estimate, you won’t get a licensing fee from each sale and this may be time-limited (e.g. the value of 2G technology not used in current handsets is falling). However, it does show that a licensing revenue of £20 million per patent over its lifetime is not completely pie-in-the-sky and may be relevant if you are lucky and patent a subsequent core technology.

We can do another quick cross check using IBM. Figures circling around (and seen personally in talks by IBM) are that it takes in about $1 billion USD in patent licensing revenue per year (see here – https://www.forbes.com/sites/chuckjones/2016/01/19/if-patents-are-so-valuable-why-does-ibms-intellectual-property-revenue-continue-to-decline/#335ebe9f1433). IBM has around 200,000 granted patents (see here http://www.patsnap.com/resources/company-innovation-reports/ibm). This works out at ~$5k / patent / year in licensing revenue (~£4k). Extended for 20 years, this gives us a figure of £80k per patent in licensing revenue over its lifetime.

In this case, IBM covers its patenting costs, but there is only a small real return from licensing alone. Hence, for IBM licensing is a useful aspect to cover costs, but must form only a portion of the value of a given patent.

Selling Patents

Valuing individual patents is tricky. This article here is interesting – http://www.hayes-soloway.com/patent-valuation . It suggests a lower bound on patent transactions of around $90,000 (£70k), a median of around $200k (£150k) and an average of around $400k (£300k). Each of Kodak’s patents was valued at around $500k when recently sold in 2012.

These valuations are consistent with the numbers discussed so far. The lower bound on the value of patents when sold is a little above cost (but not below cost). The median amount provides the magical £150k figure discussed above, i.e. a real return of around 5-6%. If you are lucky and/or skilled (delete depending on your political persuasion), a value of around £300k provides a decent market-beating return of around 10%. The higher figures also compensate for the fact that average patent grant rates are around 50% – hence, there is a certain amount of survivor bias and each of these sells would need to factor in the sunk costs of their unsuccessful brethren.

Another caveat here – patents tend to be very illiquid and most patent transactions involve large companies with large patent portfolios. Hence, while these figures may be applicable to similar sized entities, they may not apply as much to small and medium sized businesses. The distribution of values is also likely to be a power law distribution, with a few patents having astronomical valuations, and a long tail of patents with low valuations.

Here, we see that if you are a large company, it is worth patenting for the value you realise if you sell your patents.

Access to Market

We now move into the more hand-wavy aspects of patent valuation.

Underlying all this discussion is the fact that patents allow you to sue those who are providing products without your permission that fall within your patent claims . Licensing is one way to realise this value by providing permission for cash.

Another way patents can provide value is by allowing you access to a market at a low cost through cross-licensing. This is where another entity has at least one patent that covers your product or service. They could thus prevent you from accessing the market by either refusing permission or demanding high licensing fees. However, you have a patent that covers their product or service. Hence, each side has a potential weapon they can deploy and the sensible outcome is to come to an agreement to provide permission to use each other’s technology.

The problem with cross-licensing is that these deals are typically performed in confidence. There is thus little data to quantify the transaction. Standard public licensing rates provide some indication of the value. Hence, the licensing figures from above may be used here.

Average licensing rates can vary from 0.01% to 30% depending on the technology, product and market. Most are probably below 5-10%, with higher rates for low volume, high profit products (e.g. software services) and lower rates for commodity items (e.g. phone handsets).

One (very rough) way you can value access to a market is thus to:

  • determine the size of the potential market for your product;
  • determine an average revenue for you for this market over a 20-year period; and
  • times this by 10%.

Working backwards from our figures above, this gives us an average revenue of £150k / 0.1 = £1.5 million over 20 years (which may be £300k / year for a 5 year lifespan, £150k / year for a 10 year lifespan, and £75k / year for a 20 year lifespan etc.).

If you are not selling your product yet, you can look at figures for the size of the potential market by dividing these figures by an estimated, percentage market share. For example, if you believe you can gain 10% of a market, the market needs to be worth £15 million over the 20 years (e.g. £3 million / year for a 5 year lifespan, £1.5 million / year for a 10 year lifespan, and £750k / year for a 20 year lifespan etc.).

The other flip-side to this is to look at the cost of litigation. If cross-licensing can avoid the costs of litigation then this also provides value. If we say an average court case costs between £1-3 million, then the value of your patent depends on the likelihood of litigation. In this case, if a chance of litigation is above 15%, patenting is cost effective. Here, you can also ask for a quote for litigation insurance in your market and use that to determine the value of any patent on a competitor’s product or service.

These simple calculations mean that, for a product with a 5 year lifespan and a potential market of only £100k per year, patenting may not be cost effective if looking at access to market.

Getting Investment

One reason why small businesses obtain patents is to gain investment.

Likewise, one reason venture capitalists invest in small businesses with patents is because they perform similar calculations to those above (although with prettier and more accurate spreadsheets) and realise they can obtain an above market return (or a market return for a given risk – 90% of small businesses fail folks).

Now venture capitalists have requests for funding from many small startups (understatement). Most of these will be refused. One way you can cut through the noise as a company is to show you have at least a strong chance of obtaining a patent. Hence, a patent application may provide an immediate effect by enabling leverage – i.e. the patenting costs may facilitate a much large amount of funding.

Of course, there are many different factors that influence funding, and most of these may be more important than a patent portfolio (such as founders / founder experience, market proposition, existing capital raised, and existing profit). Let’s say, conservatively, that having a patent increases your chance of funding from 0% to 10%. In this case, funding of £200k plus would justify an initial £20k patent spend (e.g. initial filing and International application).

Another way of looking at this may be to compare patenting costs and engineer costs. Say an engineer costs £50k / year, where on-costs are £75k (i.e. actual cost to company is 1.5x salary). The question to then ask is: what would increase your chances of funding more: 4 months of that engineer’s time or having a patent application?

If the answer is that, at your current stage of development, 4 months engineer time would greatly enhance your offering and increase your chance of funding by 50%, then limited funds may be better spent on that rather than patenting.

If you are at a stage where development has been kept confidential, and 4 months of engineer time would make only small incremental improvements to attract funding, then patenting becomes a better choice.

You can also run similar arguments with consultant costs and other areas such as marketing.

Marketing

Patented products make for good marketing.

This may only be a small proportion of a patent’s value but should not be overlooked.

For example, an average marketing budget may be 10% of sales. If a patent replaces 1% of that (i.e. has the same effect as 1% of the sales budget), then a patent could start to make a decent return if revenues are £15 million or more over 10 year (i.e. £1.5 million / year).

What Have We Learnt

Often it is difficult to provide an answer to the question: should I get a patent?

Patent attorneys typically err on the side of saying “yes”, as that is what they do day-in-day-out. It can be like asking a decorator: should I paint my house? (I decided not to say it may be like asking a car salesman: should I buy a car? :))

In certain businesses the answer is often “yes”, but the reason is “because that’s what we do”. Similar, in other businesses (I’m looking at you software), the answer is often “no”, with the reason being “because we don’t do that here”.

Hopefully, in the discussion above, I have tried to explain some of the areas and conditions where there may be an economic justification for obtaining a patent.

In particular, assuming a product with a 10 year lifespan, patenting may be cost effective:

  • if you are paying UK corporation tax and your product will earn £150k / year in profit;
  • if your market is worth more than £1.5 million per year and you can capture at least 10% of this;
  • if the patented technology is of interest to one or more acquirers;
  • if the chance of litigation is above 15% in your market;
  • if it increases your chance of funding from 0 to 10%; or
  • if it increases sales by 1% of products with revenues of more than £1.5 million / year.

Some of these value factors may be gained independently.  For example, a patent may allow you to reduce UK corporation tax, increase sales, provide access to a market and reduce litigation risk. The more the factors apply cumulatively, the lower the figures above need to be.

By sketching these numbers out on the back of an envelope, say over 30 minutes, you can get a feel for how relevant patenting is for your company.

If you look at these figures and gasp, then patenting may not be right for you. Although patenting is open to anyone, practically you need to be a business with actual or projected revenues of hundreds of thousands of pounds for the system to work properly.

If you are close to break-even thresholds, there need to be other good reasons to patent, or prospects for future growth need to be good, otherwise patenting may not be worthwhile economically.

If you are way over the thresholds, and you do not have a patenting strategy, then this provides strong basis for an argument to your Board of Directors to get one. It may justify spending a few thousand pounds on professional advice to fill in the details of feasibility.

If you have an existing patenting strategy, running these calculations once a year or so may enable you to make decisions on maintaining patents and patent applications, and provide justification to support existing budgets (or even to ask for more funds).