blog cinnamon blog

LayoutLM (Layout Language Model)を試したら精度がめっちゃ上がった件について
  • technology

About the case where the accuracy improved significantly when I tried LayoutLM (Layout Language Model)

Hello. I am in charge of Cinnamon AI public relations.

Cinnamon AI develops products using natural language processing technology. Aurora Clipper(Aurora Clipper) provides various functions such as obtaining dates with specific context (event dates, contract dates, etc.) and person names (relationship of contract parties), extracting key points from long texts, classifying text, etc. This product is used for various purposes.

This time, as the basic model of Aurora Clipper,LayoutLMMr. Fujii, who leads the development of Aurora Clipper, will introduce the results of an experiment using an algorithm called .

What is LayoutLM that uses text position as a feature?

LayoutLM(Layout Language Model) is a new natural language processing algorithm proposed by Microsoft Research in 2020.

When it comes to natural language processing, Transformer algorithms such as BERT are famous, but the main feature of this algorithm is that it pre-learns a large amount of text and performs transfer learning according to the purpose of each development. It is known for dramatically improving accuracy compared to previous CNN/RNN type algorithms.

Until now, a major issue in natural language processing has been how to understand not only the words in a document, but also the context. Against this background, many studies have been conducted on methods to capture context from documents, and with the birth of the Transformer-type algorithm, we have succeeded in improving accuracy by capturing the word order of sentences as a feature.

We are also continuing to research Transformer-type algorithms.

For example, in May 2020, we localized and published ELECTRA, which uses GAN (Generative Adversarial Network) proposed by Google Brain, into Japanese for the first time in the world (details). teethhere).

Our company has confirmed that such Transformer-type algorithms can produce highly accurate results through various projects. On the other hand, the Transformer-type algorithm only treats word order (words before and after) as a feature, so it was not possible to capture the position in which the word was written as a feature.

“LayoutLM” was born to solve these problems.

A major feature of this algorithm is that it encodes not only simple word order but also layout information such as the placement of text fields within a document, and learns the probabilities of their occurrence as an expanded language model.

Therefore, in addition to a large amount of text data, we pre-learn a large amount of form formats (invoices, resumes, etc.) and their coordinates.

As a result, by using LayoutLM, in the SORIE Dataset (a dataset for tasks that extract information from receipts), the accuracy (F-measure) was 92.00% with BERT, but it was 95.24% with LayoutLM, which significantly improved the accuracy.

*The above image is taken from Xu et al, 2020

Summary①
✔ By using Transformer type algorithm,Natural language processing technology has made great progressdid.
✔ LayoutLM differs from Transformer-type algorithms in that it not only changes word order but alsoText location information can also be used as a featureIt became so.
✔ As a result, in the task of extracting information from receipts, compared to traditional Transformer-type algorithms,Accuracy (F value) improved by more than 3%did. 

When I actually tried LayoutLM, I was amazed...!

Well, I was able to get a big effect like this.LayoutLMHowever, at Microsoft Research, the main focus is on English documents.
Therefore, there are no research results using Japanese documents yet.

Therefore, this time we tried to learn LayoutLM in advance using approximately 17,000 Japanese documents that we have.

These data include documents related to orders such as invoices and delivery notes, documents related to the insurance industry such as medical bills and medical fee statements (receipts), and documents related to the manufacturing industry such as product specifications. , is made up of a large amount of documents that we have accumulated as research data.

Of course, these large amounts of forms have coordinates and text labels attached to them, making them the perfect data for building LayoutLM.

Now, let's take a look at the results of actually testing LayoutLM.

The task to be carried out is "information extraction from a Japanese document consisting of 1 to 3 pages".
*The expression "Japanese documents" is a bit vague, but since it is valuable research data, I purposely used the vague expression.

As shown in the table above, it was found that LayoutLM achieved 11% higher accuracy than the conventional Transformer type algorithm.

Furthermore, it was found that LayoutLM, which was uniquely localized to Japanese by our company (pre-trained version using proprietary data), was 8% more accurate than LayoutLM (original version of the paper), which was pre-trained by Microsoft.

In this way, we found that LayouLM, which was constructed using our research data, has a fairly high accuracy.

Now, there is one more thing to consider if you want to use the latest algorithms in a production environment.
It's the processing speed. In particular, some Transformer-type algorithms require a GPU, and when operating on-premises, it is best to avoid using algorithms that require infrastructure costs.

Therefore, we investigated the processing speed of LayoutLM and found that one file can be processed within 5 seconds even with a CPU, as shown below.

Based on my experience, there are many cases where a CPU can be accepted at this processing speed, and it is also possible to improve throughput by using a distributed configuration to increase speed.
From the above results, we found that our proprietary LayoutLM has considerable potential for use in this development.

This algorithm has already been incorporated into our natural language processing product Aurora Clipper and form recognition product Flax, and we are working to improve its functionality. In the future, we would like to expand this to include images, audio, etc., and utilize our multimedia recognition engine know-how to develop a multimodal analysis engine!

Summary ②
✔ We built a Japanese-specific LayoutLM using the approximately 17,000 research data we have, and found that it was more effective than the conventional Transformer-type algorithm.Accuracy improved by 19%did.
✔ LayoutLM built by our company has fast processing speed, depending on speed requirements,Can be fully used in this developmentIt became something.
✔ In the future, we will deploy this algorithm not only to the natural language processing product Aurora Clipper but also to the AI-OCR product FLAX Scanner.Aiming to improve the accuracy of overall information extraction.

For inquiries regarding this article or requests for business negotiations, please contactherePlease contact us from.

Also, Cinnamon AI regularlyHolding a seminarDoing.