WO2024055864A1

WO2024055864A1 - Training method and apparatus for implementing ia classification model using rpa and ai

Info

Publication number: WO2024055864A1
Application number: PCT/CN2023/116770
Authority: WO
Inventors: 段沛宸
Original assignee: 北京来也网络科技有限公司; 来也科技(北京)有限公司
Priority date: 2022-09-16
Filing date: 2023-09-04
Publication date: 2024-03-21
Also published as: CN115578739A

Abstract

The present application relates to a training method and apparatus for implementing an IA classification model using RPA and AI. The training method comprises: obtaining position coordinates of at least one word comprised in each sample page and a category to which each sample page belongs; inputting the position coordinates of each word into a pre-trained document understanding model to obtain a corresponding encoding vector; on the basis of the encoding vector corresponding to each word in each sample page, obtaining an encoding vector corresponding to each sample page; and taking the encoding vector corresponding to each sample page and the category to which each sample page belongs as training data, and training an initial classification model to obtain a target classification model for document classification. The speed of training the classification model is increased, and the amount of data required in a training process is reduced. The present application further provides a method for implementing IA document classification using RPA and AI, in which a target document sent by a RPA robot undergoes IA classification by using the target classification model. The labor cost required by document classification is reduced, and the document classification efficiency is improved.

Description

Training methods and devices for classification models that combine RPA and AI to implement IA

Cross-references to related applications

This disclosure claims priority from Chinese Patent Application No. 2022111259565 filed in China on September 16, 2022, the entire content of which is incorporated herein by reference.

Technical field

The present disclosure relates to the technical fields of robotic process automation and artificial intelligence, and in particular to a training method and document classification method and device for classifying a classification model that combines RPA and AI to realize IA, electronic equipment, vehicles, computer-readable storage media, computer program products, and Computer program.

Background technique

Robotic Process Automation (RPA) uses specific "robot software" to simulate human operations on a computer and automatically execute process tasks according to rules.

Artificial Intelligence (AI for short) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Intelligent Automation (IA) is a general term for a series of technologies from robotic process automation to artificial intelligence. It combines RPA with Optical Character Recognition (OCR), Intelligent Character Recognition (ICR), and process mining. (Process Mining), Deep Learning (DL), Machine Learning (ML), Natural Language Processing (NLP), Speech Recognition (Automatic Speech Recognition, ASR), Speech Synthesis (Text To Speech) , TTS), Computer Vision (CV) and other AI technologies are combined to create end-to-end business processes that can think, learn and adapt, covering from process discovery, process automation, to automatic and continuous The entire process of data collection, understanding the meaning of data, and using data to manage and optimize business processes.

In the business scenario of intelligent document processing, a complex business process may involve processing several types of documents, and different types of documents need to call different information extraction models to extract information, and then perform follow-up operations based on the extracted key information. Business processing, such as information entry, bill reimbursement, etc. For example, when an RPA robot is used to automatically process an email from customer A to supplier B for ordering products, the email attachment may contain documents such as contracts and invoices. The RPA robot needs to call the contract extraction model to extract information from the contract documents and call the general The multi-bill model is used to extract information from invoice documents, and then perform subsequent processing based on the extracted information. This requires first using a classification model to classify documents, and then calling the information extraction model of the corresponding category to extract information from the documents, and then implement further One-step business processing. How to quickly train the classification model with less training data while ensuring the accuracy of the classification model has become an urgent problem to be solved.

Contents of the invention

Embodiments of the present disclosure provide a training method and document classification method and device, electronic equipment, vehicles, computer readable storage media, computer program products and computer programs that combine RPA and AI to implement IA, to solve problems in related technologies. Model training methods for document classification have technical problems such as the need to use a large amount of training data for training, and the training time of the classification model is long.

An embodiment of the first aspect of the present disclosure provides a training method for a classification model that combines RPA and AI to implement IA. The method includes: obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining each sample. The category of the page; input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain each The encoding vector corresponding to the sample page; and the encoding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain the target classification model for document classification.

In some embodiments, the coding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain a target classification model for document classification, including: dividing the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages. The verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page are labeled according to their category; based on each first page According to the corresponding coding vector and its category, the initial classification model is trained for multiple rounds to obtain the candidate classification model after each round of training; and based on the coding vector corresponding to each second page and its category, the candidate classification model after each round of training is obtained. In the classification model, select the target classification model used for document classification.

In some embodiments, based on the coding vector corresponding to each second page and its category, selecting a target classification model for document classification from the candidate classification models after each round of training includes: for the candidate classification models after each round of training The model inputs the coding vector corresponding to each second page into the candidate classification model to obtain the confidence that each second page predicted by the candidate classification model belongs to multiple preset categories, and based on the prediction that each second page belongs to multiple preset categories The confidence of the category and the category of each second page are used to determine the loss value corresponding to the candidate classification model; and the loss value corresponding to the candidate classification model after each round of training is selected from the candidate classification model after each round of training. Target classification model for document classification.

In some embodiments, obtaining the position coordinates of at least one word included in each of the multiple sample pages includes: obtaining multiple sample documents sent by the RPA robot; for each sample page, obtaining optical character recognition of the sample page OCR recognition information; based on the OCR recognition information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; Obtain the position coordinates of the area occupied by each text fragment; and obtain the position coordinates of each word based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.

The training method of the classification model that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the position coordinates of at least one word included in each sample page in multiple sample pages, and obtains the category to which each sample page belongs; The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; The encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.

An embodiment of the second aspect of the present disclosure provides a document classification method that combines RPA and AI to implement IA. The method includes: obtaining a target document sent by an RPA robot, where the target document includes at least one target page; and obtaining at least one word included in each target page. The position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page; based on the encoding vector corresponding to each word in each target page, each target is obtained The coding vector corresponding to the page; input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect Obtain; and for each preset category, determine the average value of the confidence that each target page belongs to the preset category; based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category.

The document classification method that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. The encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.

A third embodiment of the present disclosure provides a training device that combines RPA and AI to implement an IA classification model, including: a first acquisition module for acquiring the position coordinates of at least one word included in each sample page in multiple sample pages. , and obtain the category to which each sample page belongs; the first processing module is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; the second The acquisition module is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page; and the training module, It is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification.

In some embodiments, the training module includes: a dividing unit, used to divide the training data into a training set and a verification set. The training set includes a plurality of encoding vectors corresponding to the first page, and the verification set includes a plurality of encoding vectors corresponding to the second page. Coding vectors, each first page and each second page are annotated with their categories; the training unit is used to conduct multiple rounds of training on the initial classification model based on the coding vectors corresponding to each first page and their categories, so as to obtain each round Candidate classification models after training; and a selection unit for selecting a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.

In some embodiments, the selection unit includes: a processing subunit, configured to input the coding vector corresponding to each second page into the candidate classification model for each round of training to obtain the candidate classification model predicted by the candidate classification model. The confidence that each second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model; and select The subunit is used to select the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.

In some embodiments, the first acquisition module includes: a first acquisition subunit, used to acquire multiple sample documents sent by the RPA robot; a second acquisition subunit, used for each sample page, to acquire the optical image of the sample page. Character recognition OCR identification information; the third acquisition sub-unit is used to obtain the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page; the word segmentation unit is used to segment the text content of each text fragment , obtain at least one word included in each text segment; the fourth acquisition subunit is used to obtain the position coordinates of the area occupied by each text segment; and the fifth acquisition subunit is used based on the position coordinates of the area occupied by each text segment, and the position of each word in the corresponding text fragment, and obtain the position coordinates of each word.

The training device that combines RPA and AI to implement an IA classification model provided by embodiments of the present disclosure obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs; The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; The encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.

An embodiment of the fourth aspect of the present disclosure provides a document classification device that combines RPA and AI to implement IA. The device includes: a third acquisition module for acquiring the target document sent by the RPA robot, where the target document includes at least one target page; fourth The acquisition module is used to obtain the position coordinates of at least one word included in each target page; the second processing module is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the position coordinates of each word in each target page. word pair The corresponding encoding vector; the fifth acquisition module is used to obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page; the third processing module is used to input the encoding vector corresponding to each target page into the target Classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect; the first determination module is used for each preset category, Determine the average value of the confidence that each target page belongs to the preset category; and the second determination module is used to determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.

The document classification device that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. The encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.

The embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the above-mentioned first step of the present disclosure is implemented. The method described in the embodiment of the first aspect, or implement the method described in the embodiment of the second aspect of the present disclosure.

The sixth embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method described in the first embodiment of the present disclosure is implemented, or the computer program is implemented as The method described in the embodiment of the second aspect of the present disclosure.

The seventh embodiment of the present disclosure proposes a computer program product, including a computer program. When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method described above in the present disclosure. The method described in the embodiment of the second aspect.

The eighth embodiment of the present disclosure provides a computer program, including computer program code. When the computer program code is run on a computer, the computer performs the method as described in the first embodiment of the present disclosure, or The method is performed as described in the embodiment of the second aspect of the present disclosure of the present application.

Additional aspects and advantages of embodiments of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Description of drawings

In the drawings, unless otherwise specified, the same reference numbers refer to the same or similar parts or elements throughout the several figures. The drawings are not necessarily to scale. It should be understood that these drawings depict only some embodiments in accordance with the disclosure and are not to be considered limiting of the scope of the disclosure.

Figure 1 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure;

Figure 2 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure;

Figure 3 is a schematic flowchart of a document classification method for implementing IA by combining RPA and AI according to the third embodiment of the present disclosure;

Figure 4 is a schematic structural diagram of a training device that combines RPA and AI to implement an IA classification model according to the fourth embodiment of the present disclosure;

Figure 5 is a schematic structural diagram of a document classification device that combines RPA and AI to implement IA according to the fifth embodiment of the present disclosure;

6 is a block diagram of an electronic device used to implement a classification model training method that combines RPA and AI to implement IA or a document classification method that combines RPA and AI to implement IA according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the disclosure/disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the embodiments/disclosure of the present disclosure and are not to be construed as limitations to the embodiments/disclosure of the present disclosure.

These and other aspects of the present disclosure/disclosed embodiments will become apparent with reference to the following description and accompanying drawings. In these descriptions and drawings, some specific embodiments of the present disclosure/disclosure are specifically disclosed to represent some ways of implementing the principles of the present disclosure/disclosure embodiments, but it should be understood that the present disclosure/disclosure The scope of the embodiments is not limited in this way. On the contrary, the present disclosure/disclosed embodiments include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.

It should be noted that the acquisition, storage and application of data involved in the technical solution of this disclosure application are in compliance with relevant laws and regulations and do not violate public order and good customs.

In related technologies, pre-trained document understanding models such as the LayoutLM model are usually used to understand the documents, and then a classification model is used to classify the documents based on the understanding results. In order to achieve document classification in different business scenarios, pre-trained document understanding models and classification models are usually jointly trained based on training data related to business scenarios. In other words, in order to implement document classification in a certain business scenario, not only the classification model must be trained, but also the entire pre-trained document understanding model must be fine-tuned. However, the structure of the pre-trained document understanding model is complex and requires more training data to achieve fine-tuning training of the pre-trained document understanding model. The entire training process takes a long time.

Embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA. A classification model for document classification can be obtained without fine-tuning the pre-trained document understanding model. Among them, the method includes: obtaining the position coordinates of at least one word included in each sample page in the plurality of sample pages, and obtaining the category to which each sample page belongs; inputting the position coordinates of each word in each sample page into the pre-training document understanding model to obtain the coding vector corresponding to each word in each sample page; based on the coding vector corresponding to each word in each sample page, obtain the coding vector corresponding to each sample page; use the coding vector corresponding to each sample page and its category as training Data is used to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.

In order to clearly explain each embodiment of the present invention, technical terms involved in the embodiments of the present invention are first explained.

In the description of the embodiments/disclosure of the present disclosure, the term "plurality" means two or more.

In the description of the embodiments of this disclosure, "RPA robot" refers to a software robot that can combine AI technology and RPA technology to automatically perform business processing. RPA robots have two characteristics: "connector" and "non-intrusion". By simulating human operation methods, they can extract, integrate and connect data from different systems in a non-intrusive way without changing the information system.

In the description of the embodiments of the present disclosure, "document" is an electronic document, which can be a document in PDF (Portable Document Format, Portable Document Format) format obtained by scanning a paper document, or it can also be a document stored on a computer, mobile phone, etc. The embodiment of the present disclosure does not limit the documents edited and formed in the smart device. "Target document" is the document to be classified. "Page" is a page included in the document. For example, an electronic contract document can have one or more pages. "Sample pages" are the pages included in the sample documents used for model training. The "first page" refers to each page included in the training set after the training data is divided into a training set and a verification set. "Second page" refers to each page included in the verification set after dividing the training data into a training set and a verification set. "Target page" is a page included in the target document to be classified.

In the description of the embodiments of the present disclosure, a "text fragment" is a fragment composed of part of the content on the page. The text fragment may be one line or less than one line of text arranged horizontally, or it may be a column or text arranged vertically. For text that is less than one column, the embodiment of the present disclosure does not limit this.

In the description of the embodiments of the present disclosure, the "encoding vector corresponding to a word" is a vector used to represent the feature information of the word, where the feature information of the word includes, for example, the position of the word on the page. "Encoding vector corresponding to the sample page" is a vector used to characterize the characteristic information of the sample page. The characteristic information of the sample page includes, for example, the positions of all words included in the sample page on the page. "Encoding vector corresponding to the target page" is a vector used to characterize the characteristic information of the target page, where the characteristic information of the target page includes, for example, the positions of all words included in the target page in the page, etc.

In the description of the embodiments of the present disclosure, "pre-trained document understanding model" refers to a pre-trained model for understanding documents, such as the LayoutLM model (a pre-trained model that processes multi-modal information (text and layout information)) , LayoutLM2.0 model, etc., the embodiments of the present disclosure do not limit this, as long as it can be used to encode the pages in the document, and obtain the encoding vector corresponding to each word in the page.

In the description of the embodiment of the present disclosure, the "preset category" refers to the category to which documents created in advance may belong according to needs. For example, it can be set to a bill category, a contract category, etc. "Category of the target document" refers to the category of the target document to be obtained by predicting the category of the target document to be classified using the trained target classification model. "Category of the sample page" refers to the category to which the sample page actually belongs, such as bill category, contract category, etc.

In the description of the embodiments of the present disclosure, the "classification model" is an AI neural network model used for document classification, and its structure can be set as needed. The input of the classification model is the coding vector corresponding to the page in the document, and the output of the classification model is the predicted category of the corresponding page, which may be the confidence that the page belongs to one or more preset categories.

In the description of the embodiments of the present disclosure, "confidence" can indicate the likelihood that a certain page belongs to a certain preset category. For example, the confidence level that the target page belongs to the preset category A indicates the probability that the target page belongs to the preset category A.

In the description of the embodiment of the present disclosure, "the average value of the confidence that each target page belongs to the preset category" is the value obtained by averaging the confidence that each target page belongs to the preset category.

In the description of the embodiments of the present disclosure, the "document processing platform" is an intelligent automation platform for intelligently processing documents. Among them, intelligent document processing (IDP) is one of the core capabilities of the intelligent automation platform. Intelligent Document Processing (IDP) is based on Optical Character Recognition (OCR), Computer Vision (CV), Natural Language Processing (NLP), Knowledge Graph (KG) ) and other AI technologies, it can identify, classify, extract elements, verify, compare, and correct errors of various types of documents, and is a new generation of automation technology that helps enterprises realize the intelligence and automation of document processing.

In the description of the embodiments of the present disclosure, "OCR (Optical Character Recognition)" specifically refers to the electronic device checking the characters printed on the paper, determining their shape by detecting dark and light patterns, and then using the character recognition method The process of translating shapes into computer text; that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format through recognition software. Technology for further editing and processing by word processing software.

The following describes a training method for a classification model that combines RPA and AI to implement IA, a document classification method and device that combines RPA and AI to implement IA, electronic equipment, vehicles, and computer-readable storage media according to embodiments of the disclosure/disclosure, Computer program products and computer programs.

First, the training method of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure will be described with reference to the accompanying drawings.

Figure 1 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure. As shown in Figure 1, the method may include steps 101 to 104.

Step 101: Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.

It should be noted that the training method of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a training device that combines RPA and AI to implement the classification model of IA. Hereinafter, the training device that combines RPA and AI to implement the IA classification model will be referred to as the training device. The training device may be implemented by software and/or hardware, and the training device may be an electronic device, or may be configured in an electronic device to implement training of a classification model for document classification. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.

Among them, the words included in the sample page are words (that is, tokens) obtained by segmenting the text fragments in the sample page. Among them, text fragments can be segmented based on preset vocabulary lists and rules. For example, for Chinese, it can be segmented word by word. For example, for the text segment "1 23", the words obtained by segmentation are "1" and "23". For the text segment "Zhang San's", the words obtained by segmentation are "Zhang San" " and "of"; for English, it can be divided into sub-words (sub-words) of stems and affixes. For example, for the text fragment "working", the words obtained by segmentation are "work" and "ing".

The position coordinates of the word are used to represent the position of the word in the page (referring to the sample page in this embodiment). For example, the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the sample page as the origin.

For each sample page, if the sample page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent model training. For example, assuming that the number of words is preset to 128, for each sample page, the position coordinates of up to 128 words can be obtained.

Step 102: Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.

In some embodiments, for each word in each sample page, the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, so that the training device can obtain the corresponding word encoding vector.

Step 103: Obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.

In some embodiments, for each sample page, if the sample page includes a word, the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the sample page; the sample page includes multiple word In this case, the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the sample page.

Step 104: Use the coding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.

In some embodiments, an initial classification model may be pre-constructed. Among them, assuming that the encoding length of the encoding vector corresponding to each sample page is L and the number of preset categories is M, the initial classification model can be an L*M-dimensional matrix. Among them, L is an integer greater than 1, and M is an integer greater than 0. By inputting the 1*L-dimensional encoding vector corresponding to a sample page into the classification model, a 1*M-dimensional vector can be obtained. Each element in the 1*M-dimensional vector indicates that the sample page belongs to M presets. Confidence of the category.

Then, the encoding vector corresponding to each sample page can be used as the input of the classification model, and the category of each sample page can be used as the label, and the initial classification model can be supervised and trained to obtain the target classification model.

Among them, the target classification model can be used to classify documents. Therefore, in actual business scenarios, you can first use the target classification model to classify documents, and then call the information extraction model of the corresponding category to extract information from the documents, and then implement further business processing based on the extracted information.

It can be understood that in the embodiments of the present disclosure, the pre-trained document understanding model is used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the sample page, and the pre-trained document understanding model is not performed during the training process. Training, only the classification model is trained. Because the classification model does not need to be used to generate encoding vectors corresponding to each sample page, and has a simple structure, it only needs to use a small amount of training data for training to obtain a classification model that can accurately classify documents, and the training process is short. As a result, the training speed of the classification model can be increased and the amount of data required during the training process can be reduced without affecting the accuracy of document classification. In addition, the classification model has a simple structure and takes up little space, making it easy to deploy.

In summary, embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs; Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain the encoding corresponding to each sample page Vector; use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.

The training method of the classification model that combines RPA and AI to implement IA provided by the embodiment of the present disclosure will be further described below with reference to FIG. 2 .

Figure 2 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure. As shown in Figure 2, the method includes steps 201 to 206.

Step 201: Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.

In some embodiments, the position coordinates of at least one word included in each sample page in the plurality of sample pages can be obtained in the following manner, that is, step 201 can include: for each sample page, obtain the optical character recognition OCR of the sample page. Identification information; based on the OCR identification information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; obtain the proportion of each text fragment The position coordinates of the area; based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment, the position coordinates of each word are obtained.

The area occupied by each text fragment is usually a rectangle. The position coordinates of the area occupied by the text fragment may include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the text fragment, or the position coordinates of the upper right corner vertex and the lower left corner vertex. Position coordinates. The position coordinates of a word may also include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the word, or the position coordinates of the upper right corner vertex and the lower left corner vertex.

In some embodiments, for each sample page, OCR recognition technology can be used to identify it in advance to obtain the OCR identification information of the sample page, and then obtain the text content of at least one text fragment from the OCR identification information of the sample page, Then, based on the preset vocabulary list and rules, the text content of each text segment is segmented to obtain at least one word included in each text segment. Among them, for Chinese, it can be segmented word by word; for English, it can be segmented into sub-words (sub-words) of stems and affixes.

In addition, for each sample page, the width and height of the sample page, as well as the position of each text fragment in the sample page, can be obtained, and then based on the width and height of the sample page, and the position of each text fragment in the sample page, determine The position coordinates of the area occupied by each text fragment, such as the position coordinates of the upper left corner vertex and the lower right corner vertex (or the position coordinates of the upper right corner vertex and the lower left corner vertex).

Furthermore, for each sample page, the position coordinates of each word can be obtained based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.

The following describes the process of obtaining the position coordinates of each word in the text fragment based on the position coordinates of the area occupied by a certain text fragment in a sample page and the position of each word in the text fragment. Among them, assuming that the upper left corner of the sample page is the origin of the coordinate system, the position coordinates of the area occupied by the text fragment include the x-axis coordinates and y-axis coordinates (x1, y1) of the upper left corner vertex, and the x-axis coordinates of the lower right corner vertex and y-axis coordinate (x2,y2). The position coordinates of the word include the x-axis coordinate and y-axis coordinate (x3, y3) of the upper-left corner vertex of the area occupied by the word, and the x-axis coordinate and y-axis coordinate (x4, y4) of the lower-right corner vertex.

Since the text fragment may be one line or less than one line of text arranged horizontally, or one or less than one column of text arranged vertically, you can first determine whether the text fragment is arranged horizontally or vertically. Among them, it can be judged whether y2-y1 corresponding to the text fragment is less than A(x2-x1), where A can be set as needed, for example, set to 1.5. When y2-y1 is less than A(x2-x1), it can be determined that the text fragments are arranged horizontally; when y2-y1 is not less than A(x2-x1), it can be determined that the text fragments are arranged vertically.

For horizontally arranged text fragments, the width of each word can be obtained based on the length ratio of each word in the text fragment and the value of x2-x1. Then, for the first word on the left in the text fragment, x1 is used as the x-axis coordinate x3 of the upper-left corner vertex of the word, and x3 + the width of the first word is used as the x-axis coordinate x4 of the lower-right corner vertex of the first word. . For other words in this text fragment, you can add x1 to the cumulative width of all words to the left of the word, and use it as the x-axis coordinate x3 of the upper left corner vertex of the word. Take x3 + the width of the word as the lower right corner vertex of the word. The x-axis coordinate x4. In addition, for each word in the text fragment, y1 can be regarded as the y-axis coordinate y3 of the upper-left corner vertex of each word, and y2 can be regarded as the y-axis coordinate y4 of the lower-right corner vertex of each word.

For vertically arranged text fragments, the height of each word can be obtained based on the length ratio of each word in the text fragment and the value of y2-y1. Then, for the first word at the top of the text fragment, use y1 as the y-axis coordinate y3 of the upper-left corner vertex of the word, and use y3 + the height of the first word as the y-axis coordinate y4 of the lower-right corner vertex of the first word. . For other words in the text fragment, you can add y1 to the cumulative height of all words above the word to use it as the y-axis coordinate y3 of the upper left corner vertex of the word, and use y3 + the height of the word as the lower right corner vertex of the word. The y-axis coordinate y4. In addition, for each word in the text fragment, x1 can be regarded as the x-axis coordinate x3 of the upper-left corner vertex of each word, and x2 can be regarded as the x-axis coordinate x4 of the lower-right corner vertex of each word.

For example, assuming that A is 1.5 and the text fragment is "1 23", the words obtained by segmenting the text content of the text fragment are "1" and "23". The position coordinates of the area occupied by the text fragment "1 23" include: (x1, y1) = (2, 2) and (x2, y2) = (8, 4). Since y2-y1 is less than 1.5 (x2-x1), It can be determined that the text fragment "1 23" is arranged horizontally. Since the length ratio of "1" and "23" is 1:2, it can be obtained that the width of "1" is (8-2)*1/3=2, and the width of "23" is (8-2)*2 /3=4. According to the above method, it can be determined that the position coordinates of "1" include: (x3, y3) = (2, 2), (x4, y4) = (4, 4), and the position coordinates of "23" include: (x3, y3 )=(4,2), (x4,y4)=(8,4).

In some embodiments, the multiple sample documents obtained by the training device may be sent by the RPA robot. That is, for each sample page, before obtaining the optical character recognition OCR identification information of the sample page, it may also include: obtaining multiple sample documents sent by the RPA robot.

For example, the training device can be configured in the document processing platform, and the document processing platform can provide an upload interface, so that when the user needs to train and generate a target classification model, each sample document can be uploaded through the upload interface based on the RPA robot, so that the document processing platform The training device in can obtain multiple sample documents uploaded by the RPA robot. Depend on Therefore, by using the RPA robot to upload multiple sample pages to the document processing platform, the training device can automatically obtain the sample pages in combination with the RPA robot, thereby reducing the labor cost of training the classification model.

Step 202: Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.

Step 203: Based on the encoding vector corresponding to each word in each sample page, obtain the encoding vector corresponding to each sample page.

For the specific implementation process and principles of steps 202-203, reference can be made to the description of the above embodiments and will not be described again here.

Step 204: Divide the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages, and the verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page adopt Mark the category it belongs to.

The number ratio of the first page included in the training set to the second page included in the verification set can be set arbitrarily as needed, for example, 4:1, and the embodiment of the present disclosure does not limit this.

Step 205: Perform multiple rounds of training on the initial classification model based on the encoding vector corresponding to each first page and its category to obtain a candidate classification model after each round of training.

The number of rounds for training the initial classification model can be set arbitrarily as needed, and the embodiments of the present disclosure do not limit this.

In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, the training set can be divided into N sub-training sets, and based on the first sub-training set, the initial classification model is After a round of iterative training, the candidate classification model after this round of training is obtained, and then based on the next sub-training set, a round of iterative training is performed on the candidate classification model after the first round of training, and the candidate classification model after this round of training is obtained, and then based on the next sub-training set In the next sub-training set, perform one round of iterative training on the candidate classification model after the second round of training to obtain the candidate classification model after this round of training, and so on, so as to obtain the candidates after N rounds of training based on N sub-training sets. Classification model.

In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, a round of iterative training can also be performed on the initial classification model based on the training set to obtain a candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the first round of training to obtain the candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the second round of training. After one round of iterative training, the candidate classification model after this round of training is obtained, and so on, based on the training set, the candidate classification model after N rounds of training is obtained.

Step 206: Based on the coding vector corresponding to each second page and its category, select a target classification model for document classification from the candidate classification models after each round of training.

In some embodiments, step 206 can be implemented in the following manner: for the candidate classification model after each round of training, the encoding vector corresponding to each second page is input into the candidate classification model to obtain each prediction of the candidate classification model. The confidence that the second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model; based on the loss value corresponding to the candidate classification model after each round of training, select the target for document classification from the candidate classification model after each round of training Classification model.

In some embodiments, the confidence that each second page predicted by a certain candidate classification model belongs to multiple preset categories and the category to which each second page belongs can be substituted into the cross-entropy loss function to determine the candidate classification model. the corresponding loss value.

Among them, the cross-entropy loss function can be as shown in formula (1).

Among them, L _ce represents the loss value. N represents the number of second pages included in the validation set. C represents the number of preset categories, also called the number of categories. is a label, which is a symbolic function. When the category of the i-th second page is c, is 1, otherwise is 0. Indicates the confidence (also called prediction probability) that the i-th second page belongs to the preset category c.

In some embodiments, the candidate classification model with the lowest corresponding loss value among the candidate classification models after each round of training may be determined as the target classification model. Therefore, among the candidate classification models after each round of training, the model with the highest prediction accuracy can be determined as the target classification model.

In summary, embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs. Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page. Based on the encoding vector corresponding to each word in each sample page, obtain the encoding corresponding to each sample page. Vector, the training data is divided into a training set and a verification set. The training set includes multiple encoding vectors corresponding to the first page, and the verification set includes multiple encoding vectors corresponding to the second page. Each first page and each second page use their own Categories are marked, and based on the coding vectors corresponding to each first page and the categories they belong to, the initial classification model is trained for multiple rounds to obtain candidate classification models after each round of training, based on the coding vectors corresponding to each second page and the categories they belong to. , select the target classification model for document classification from the candidate classification models after each round of training. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.

Based on the above embodiments, embodiments of the present disclosure also provide a document classification method that combines RPA and AI to implement IA. The document classification method for implementing IA by combining RPA and AI provided by the embodiment of the present disclosure will be described below with reference to FIG. 3 .

Figure 3 is a flow chart of a document classification method that combines RPA and AI to implement IA according to the third embodiment of the present disclosure. As shown in Figure 3, the method includes steps 301 to 307.

Step 301: Obtain the target document sent by the RPA robot. The target document includes at least one target page.

It should be noted that the document classification method that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a document classification device that combines RPA and AI to implement IA. Among them, the document classification device that combines RPA and AI to implement IA can be implemented by software and/or hardware. The document classification device that combines RPA and AI to implement IA can be an electronic device. equipment, or can also be configured in electronic equipment to classify documents. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.

In some embodiments, a document classification device that combines RPA and AI to implement IA can be configured in a document processing platform, and the document processing platform can provide an upload interface. Therefore, when the user needs to classify a certain target document, the target document can be uploaded through the upload interface based on the RPA robot, so that the document classification device in the document processing platform that combines RPA and AI to implement IA can obtain the target document uploaded by the RPA robot. .

Step 302: Obtain the position coordinates of at least one word included in each target page.

Among them, the words included in the target page are words (that is, tokens) obtained by segmenting the text fragments in the target page. Among them, text fragments can be segmented based on preset vocabulary lists and rules.

The position coordinates of the word are used to represent the position of the word in the page (in this embodiment, the target page). For example, the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the target page as the origin.

Among them, for each target page, when the target page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent document classification. For example, assuming that the number of words is preset to 128, for each target page, the position coordinates of up to 128 words can be obtained.

The method of obtaining the position coordinates of at least one word included in each target page may refer to the method of obtaining the position coordinates of at least one word included in the sample page in the above embodiment, which will not be described again here.

Step 303: Input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.

In some embodiments, for each word in each target page, the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, thereby realizing IA by combining RPA and AI. The document classification device can obtain the encoding vector corresponding to the word.

Step 304: Obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.

In some embodiments, for each target page, if the target page includes a word, the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the target page; the target page includes multiple In the case of words, the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the target page.

Step 305: Input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.

Among them, the target classification model is trained through the training method of the classification model that combines RPA and AI to implement IA as shown in any of the above embodiments.

Among them, for any target page, the confidence that the target page belongs to multiple preset categories can indicate the probability that the target page belongs to each preset category.

In some embodiments, through the training method of a classification model that combines RPA and AI to implement IA as shown in any of the above embodiments, the trained target classification model can be deployed in a document processing platform to implement IA by combining RPA and AI. The document classification device can input the encoding vector corresponding to each target page into the target classification model deployed in the document processing platform to obtain the confidence that each target page belongs to multiple preset categories.

Step 306: For each preset category, determine the average value of the confidence that each target page belongs to the preset category.

In some embodiments, the confidence scores of each target page belonging to the same preset category may be summed and then averaged, thereby obtaining an average value of the confidence scores of each target page belonging to the same preset category.

Step 307: Determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.

In some embodiments, the preset category with the largest corresponding average value among each preset category may be determined as the category to which the target document belongs.

For example, assume that the default categories include category 1 and category 2. The target document includes 10 target pages. Then through step 305 and the previous steps, the confidence that each of the 10 target pages belongs to category 1 and the confidence that each of the 10 target pages belongs to category 2 can be obtained. Then you can sum up the confidence that the 10 target pages belong to category 1, and then average it to get the average of the confidence that the 10 target pages belong to category 1, and sum the confidence that the 10 target pages belong to category 2, Then take the average to get the average of the confidence that the 10 target pages belong to category 2. Assuming that the average confidence level of 10 target pages belonging to category 1 is greater than the average confidence level of 10 target pages belonging to category 2, it can be determined that the category to which the target document belongs is category 1.

The pre-trained document understanding model in the embodiment of the present disclosure is universal for various business scenarios and does not require training. It can be used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the target page. The target classification model does not need to be used to generate encoding vectors corresponding to each target page. It has a simple structure and only needs to use a small amount of training data for training. The training process is short and the prediction effect of the trained target classification model is not good. Affected, accurate classification of documents can still be achieved.

In summary, the document classification method that combines RPA and AI to implement IA provided by the embodiments of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, and obtains the position coordinates of at least one word included in each target page. Input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, obtain the encoding corresponding to each target page. Vector, input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, determine the average confidence level that each target page belongs to the preset category. value, based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.

In order to implement the above embodiments, embodiments of the present disclosure also propose a training device that combines RPA and AI to implement an IA classification model. Figure 4 is a schematic structural diagram of a training device for implementing an IA classification model by combining RPA and AI according to the fourth embodiment of the present disclosure.

As shown in Figure 4, the training device 400 that combines RPA and AI to implement an IA classification model includes: a first acquisition module 401, a first processing module 402, a second acquisition module 403, and a training module 404.

The first obtaining module 401 is used to obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.

The first processing module 402 is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.

The second acquisition module 403 is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.

The training module 404 is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.

It should be noted that the training device 400 of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can perform the training method of the classification model that combines RPA and AI to implement IA provided in the above embodiments. Among them, the training device 400 that combines RPA and AI to implement the classification model of IA can be implemented by software and/or hardware. The training device that combines RPA and AI to implement the classification model of IA can be an electronic device, or can also be configured in an electronic device. , to implement the training of classification models for document classification. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.

In one embodiment of the present disclosure, the training module 404 includes a dividing unit, a training unit and a selection unit.

A dividing unit, used to divide the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages, and the verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page Pages are labeled with their categories.

The training unit is used to perform multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and its category, so as to obtain a candidate classification model after each round of training.

The selection unit is used to select a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.

In one embodiment of the present disclosure, the selection unit includes a processing subunit and a selection subunit.

The processing subunit is used to input the encoding vector corresponding to each second page into the candidate classification model after each round of training, so as to obtain the prediction that each second page belongs to multiple preset categories predicted by the candidate classification model. Confidence, and based on the confidence that each second page belongs to multiple preset categories and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model.

Select a subunit for selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.

In one embodiment of the present disclosure, the first acquisition module 401 includes a first acquisition sub-unit, a second acquisition sub-unit, a third acquisition sub-unit, a word segmentation unit, a fourth acquisition sub-unit and a fifth acquisition sub-unit.

The first acquisition subunit is used to acquire multiple sample documents sent by the RPA robot.

The second acquisition subunit is used to acquire the optical character recognition (OCR) identification information of the sample page for each sample page.

The third acquisition subunit is used to acquire the text content of at least one text fragment in the sample page based on the OCR recognition information of the sample page.

The word segmentation unit is used to segment the text content of each text segment to obtain at least one word included in each text segment.

The fourth acquisition subunit is used to acquire the position coordinates of the area occupied by each text fragment.

The fifth acquisition subunit is used to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

It should be noted that the aforementioned explanation of the training method embodiment of the classification model that combines RPA and AI to implement IA is also applicable to the training device of the classification model that combines RPA and AI to implement IA in this embodiment. The embodiment of the present disclosure combines RPA with RPA. The unpublished details of the training device embodiments of the classification model implemented by AI and AI will not be described again here.

In summary, the training device for the classification model of IA implemented by combining RPA and AI in the embodiment of the present disclosure obtains the position coordinates of at least one word included in each sample page of multiple sample pages, and obtains the category to which each sample page belongs; the position coordinates of each word in each sample page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; the encoding vector corresponding to each sample page and the category to which it belongs are used as training data to train the initial classification model to obtain the target classification model for document classification. In this way, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the amount of data required in the training process is reduced.

In order to implement the above embodiments, embodiments of the present disclosure also propose a document classification device that combines RPA and AI to implement IA. Figure 5 is a schematic structural diagram of a document classification device that implements IA by combining RPA and AI according to the fifth embodiment of the present disclosure.

As shown in Figure 5, the document classification device 500 that combines RPA and AI to implement IA includes: a third acquisition module 501, a fourth acquisition module 502, a second processing module 503, a fifth acquisition module 504, and a third processing module 505 , the first determination module 506 and the second determination module 507.

The third acquisition module 501 is used to acquire the target document sent by the RPA robot, where the target document includes at least one target page.

The fourth obtaining module 502 is used to obtain the position coordinates of at least one word included in each target page.

The second processing module 503 is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.

The fifth acquisition module 504 is used to acquire the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.

The third processing module 505 is used to input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model uses the method described in the embodiment of the first aspect. method training.

The first determination module 506 is configured to determine, for each preset category, the average value of the confidence that each target page belongs to the preset category.

The second determination module 507 is configured to determine the category to which the target document belongs from among the preset categories based on the average values corresponding to the preset categories.

In summary, the document classification device that combines RPA and AI to implement IA in the embodiment of the present disclosure obtains the target document sent by the RPA robot, the target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and The position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. , input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, determine the average value of the confidence that each target page belongs to the preset category. , based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.

In order to implement the above embodiments, embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, The training method of a classification model that combines RPA and AI to implement IA as described in any of the foregoing method embodiments, or the document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.

In order to implement the above embodiments, embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the combination of RPA and AI as described in any of the foregoing method embodiments is implemented. Implement the training method of the classification model of IA, or implement the document classification method of combining RPA and AI to implement IA as described in any of the foregoing method embodiments.

In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program product. When the instruction processor in the computer program product is executed, the implementation of IA by combining RPA and AI as described in any of the foregoing method embodiments is realized. The training method of the classification model, or the document classification method that combines RPA and AI to implement IA as described in any of the previous method embodiments.

In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program, including computer program code. When the computer program code is run on a computer, so that the computer executes the method combined with RPA as described in any of the foregoing method embodiments. and AI to implement a classification model training method for IA, or perform a document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.

6 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure. The electronic device 10 shown in FIG. 6 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.

As shown in Figure 6, electronic device 10 is embodied in the form of a general computing device. The components of electronic device 10 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components, including memory 28 and processing unit 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.

In some embodiments, electronic device 10 includes a variety of computer system readable media. These media may be any available media that can be accessed by electronic device 10, including volatile and nonvolatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32. Electronic device 10 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive may be provided for reading and writing to a removable non-volatile disk (e.g., a "floppy disk"), and a disk drive for reading and writing a removable non-volatile optical disk (e.g., a compact disk read-only memory). A disc drive that reads and writes from Disc Read Only Memory (hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media). In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.

A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.

Electronic device 10 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 10, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 10 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22. Moreover, the electronic device 10 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication. As shown in FIG. 6 , network adapter 20 communicates with other modules of electronic device 10 via bus 18 . It should be understood that, although not shown in Figure 6, other hardware and/or software modules may be used in conjunction with electronic device 10, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.

The processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.

It should be noted that the foregoing explanations of the method and device embodiments also apply to the electronic equipment, vehicles, computer-readable storage media, computer program products and computer programs of the above-mentioned embodiments, and will not be described again here.

In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.

Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present disclosure includes additional implementations, which may not be in the order shown or discussed, This includes performing the functions in a substantially simultaneous manner or in a reverse order depending on the functions involved, which should be understood by those skilled in the art to which embodiments of the present disclosure belong.

The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of the embodiments of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.

All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims

A training method for a classification model that combines robotic process automation RPA and artificial intelligence AI to realize intelligent automation IA, wherein the method includes:

Obtain the position coordinates of at least one word included in each of the plurality of sample pages, and obtain the category to which each of the sample pages belongs;

Enter the position coordinates of each word in each sample page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page;

Based on the encoding vector corresponding to each of the words in each of the sample pages, obtain the encoding vector corresponding to each of the sample pages; and

The encoding vector and the category corresponding to each sample page are used as training data to train the initial classification model to obtain a target classification model for document classification.
The method according to claim 1, wherein the coding vector corresponding to each sample page and the category to which it belongs are used as training data to train an initial classification model to obtain the document classification model. Target classification models, including:

The training data is divided into a training set and a verification set. The training set includes encoding vectors corresponding to a plurality of first pages. The verification set includes encoding vectors corresponding to a plurality of second pages. Each of the first pages and Each second page is marked with the category to which it belongs;

Based on the coding vector corresponding to each first page and the category to which it belongs, perform multiple rounds of training on the initial classification model to obtain a candidate classification model after each round of training; and

Based on the encoding vector corresponding to each second page and the category to which it belongs, the target classification model for document classification is selected from the candidate classification models after each round of training.
The method of claim 2, wherein, based on the coding vector corresponding to each second page and the category to which it belongs, the candidate classification model for document classification is selected from the candidate classification models after each round of training. Target classification models include:

For the candidate classification model after each round of training, the encoding vector corresponding to each second page is input into the candidate classification model to obtain the predicted category of each second page predicted by the candidate classification model. Confidences of multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the candidate classification model corresponding to loss value; and

Based on the loss value corresponding to the candidate classification model after each round of training, the target classification model for document classification is selected from the candidate classification models after each round of training.
The method according to any one of claims 1 to 3, wherein said obtaining the position coordinates of at least one word included in each of the plurality of sample pages includes:

Obtain the multiple sample documents sent by the RPA robot;

For each of the sample pages, obtaining optical character recognition (OCR) information of the sample page;

Based on the OCR identification information of the sample page, obtain the text content of at least one text fragment in the sample page;

Perform word segmentation on the text content of each text segment to obtain at least one word included in each text segment;

Obtain the position coordinates of the area occupied by each text fragment; and

Based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment, the position coordinates of each word are obtained.
A document classification method that combines RPA and AI to implement IA, wherein the method includes:

Obtain the target document sent by the RPA robot, and the target document includes at least one target page;

Obtain the position coordinates of at least one word included in each target page;

Enter the position coordinates of each word in each target page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page;

Based on the coding vector corresponding to each word in each target page, obtain the coding vector corresponding to each target page;

The encoding vector corresponding to each target page is input into a target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein the target classification model passes any one of claims 1 to 4 Obtained by training according to the method described in the item;

For each of the preset categories, determine the average value of the confidence that each of the target pages belongs to the preset category; and

Based on the average value corresponding to each of the preset categories, the category to which the target document belongs is determined from each of the preset categories.
A training device that combines RPA and AI to implement an IA classification model, wherein the device includes:

A first acquisition module, configured to acquire the position coordinates of at least one word included in each sample page in a plurality of sample pages, and acquire the category to which each sample page belongs;

The first processing module is configured to input the position coordinates of each word in each sample page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page;

The second acquisition module is used to obtain the encoding vector corresponding to each of the sample pages based on the encoding vector corresponding to each of the words in each of the sample pages; and

A training module is used to use the coding vector and the category corresponding to each sample page as training data to train an initial classification model to obtain a target classification model for document classification.
The device according to claim 6, wherein the training module includes:

A dividing unit, configured to divide the training data into a training set and a verification set, where the training set includes encoding vectors corresponding to a plurality of first pages, and the verification set includes encoding vectors corresponding to a plurality of second pages, each of which The first page and each of the second pages are marked with the category to which they belong;

A training unit configured to perform multiple rounds of training on the initial classification model based on the encoding vector corresponding to each first page and the category to which it belongs, to obtain a candidate classification model after each round of training; and

A selection unit configured to select the target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and the category to which it belongs.
The device according to claim 7, wherein the selection unit includes:

A processing subunit configured to input the encoding vector corresponding to each second page into the candidate classification model for each round of training to obtain each prediction of the candidate classification model. The confidence that the second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the the loss value corresponding to the candidate classification model; and

A selection subunit is configured to select the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.
The device according to any one of claims 6 to 8, wherein the first acquisition module includes:

The first acquisition subunit is used to acquire the multiple sample documents sent by the RPA robot;

The second acquisition subunit is configured to acquire, for each sample page, the optical character recognition (OCR) identification information of the sample page;

The third acquisition subunit is configured to acquire the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page;

A word segmentation unit, used to segment the text content of each text segment to obtain at least one word included in each text segment;

The fourth acquisition subunit is used to acquire the position coordinates of the area occupied by each of the text fragments; and

The fifth acquisition subunit is used to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.
A document classification device that combines RPA and AI to implement IA, wherein the device includes:

The third acquisition module is used to acquire the target document sent by the RPA robot, where the target document includes at least one target page;

The fourth acquisition module is used to acquire the position coordinates of at least one word included in each of the target pages;

The second processing module is used to input the position coordinates of each word in each target page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page;

The fifth acquisition module is used to obtain the encoding vector corresponding to each of the target pages based on the encoding vector corresponding to each of the words in each of the target pages;

The third processing module is used to input the encoding vector corresponding to each target page into a target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein the target classification model passes the right Obtained by training according to the method described in any one of requirements 1 to 4;

A first determination module configured to, for each of the preset categories, determine the average value of the confidence that each of the target pages belongs to the preset category; and

The second determination module is configured to determine the category to which the target document belongs from each of the preset categories based on the average value corresponding to each of the preset categories.
An electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, any one of claims 1 to 4 is implemented. The method described in claim 5, or the method described in claim 5.
A computer-readable storage medium with a computer program stored thereon, wherein when the computer program is executed by a processor, the method as claimed in any one of claims 1 to 4 is implemented, or the method as claimed in claim 5 is implemented method.
A computer program product includes a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 4 is implemented, or the method according to claim 5 is implemented.
A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, the computer performs the method as claimed in any one of claims 1 to 4. method, or perform the method as claimed in claim 5.