blog cinnamon blog

データサイエンスの国際会議「DI@KDD 2021」及び、文字認識及びドキュメント解析の国際会議「ICDAR 2021」にシナモンAI 開発チームによる論文が採択されました。
  • technology

A paper by the Cinnamon AI development team was accepted at the international conference on data science “DI@KDD 2021” and the international conference on character recognition and document analysis “ICDAR 2021.”

In 2021, research results on artificial intelligence in Cinnamon AI were highly praised at international academic conferences. In this blog, we will introduce some of the results of those papers.

DI@KDD 2021 Best Paper

The paper "HYCEDIS: Hybrid Confidence Engine for Deep Document Intelligence System" presented by the Cinnamon AI research team was selected as the Best Paper at "DI@KDD 2021" held in Singapore in June this year.

DI@KDD (Document Intelligence Workshop at Knowledge Discovery and Data Mining) is an international conference on data science, machine learning, big data, and artificial intelligence sponsored by the American Society for Computing Machinery (ACM), specializing in document recognition and understanding within KDD. The workshop is ranked as the top international conference in this field. 

In the paper "HYCEDIS" published by Cinnamon AI, humans check AI results, which is an issue in AI business, by giving "confidence", an index that numerically indicates how reliable the output results of AI models are. This is an effort to solve the problem of not being able to reduce costs. This method improves the reliability of the confidence level by appropriately extracting the likelihood of the results from the complex AI-OCR process and at the same time detecting those that have an affinity with the model. 

Figure 1: HYCEDI architecture 

DI@KDD 2021 https://www.kdd.org/kdd2021/ 

ICDAR 2021 Best Paper

Furthermore, at ICDAR 2021 held in Switzerland in September, the Cinnamon AI Vietnam research team presented “A Span Extraction Approach for Information Extraction on Visually-Rich Documents.” ICDAR (International Conference on Document Analysis and Recognition) is the top international conference on character recognition and document analysis held once every two years. Our paper was selected as the Best Paper here as well. 

In this paper, we introduce LayoutLM, which has been proposed in recent years, and apply language model techniques to document recognition that includes not only text but also objects such as tables, and the Aurora Clipper, which extracts desired information from text documents based on the context. We demonstrated that this technology can be applied to Flax Scanner technology, which extracts data from non-standard documents such as forms and receipts. In particular, this paper applies a span-based QA model that is powerful for information extraction tasks, and proposes a method for extracting multiple answers to one particularly important question in Flax. At the same time, we propose a general method to speed up QA algorithms. We have proven that these methods can improve the accuracy and speed of form recognition. 

Figure 2: Recursive span extraction mechanism 

ICDAR 2021 https://icdar2021.org/

In the future, Cinnamon will use these technologies as a starting point to integrate Flax Scanner and Aurora Clipper and work on research into an intelligent document analyzer that can flexibly handle more complex documents.

We help you connect AI to competitive strategy through consulting, workshops, and solutions. We would appreciate it if you could feel free to contact us.

Click here to contact us => Inquiry form