CN115578739A

CN115578739A - Training method and device for realizing IA classification model by combining RPA and AI

Info

Publication number: CN115578739A
Application number: CN202211125956.5A
Authority: CN
Inventors: 段沛宸
Original assignee: Shanghai Laiyibert Network Technology Co ltd; Laiye Technology Beijing Co Ltd
Current assignee: Shanghai Laiyibert Network Technology Co ltd; Laiye Technology Beijing Co Ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-01-06
Also published as: WO2024055864A1

Abstract

The application relates to a training method and a device for realizing an IA classification model by combining RPA and AI, wherein the training method comprises the following steps: obtaining the position coordinates of at least one word included in each sample page and the category of each sample page; inputting the position coordinates of each word into a pre-training document understanding model to obtain a corresponding coding vector; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. The training speed of the classification model is improved, and the data volume required in the training process is reduced. The application also provides a document classification method for realizing IA by combining RPA and AI, and the IA is classified by adopting the target classification model to the target document sent by the RPA robot, so that the labor cost required by document classification is reduced, and the document classification efficiency is improved.

Description

Training method and device for realizing IA classification model by combining RPA and AI

Technical Field

The application relates to the technical field of robot process automation and artificial intelligence, in particular to a training method and a device for realizing an IA classification model by combining RPA and AI.

Background

Robot Process Automation (RPA for short) simulates the operation of a human on a computer through specific robot software, and automatically executes a Process task according to rules.

Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence.

Intelligent Automation (IA) is a generic name of a series of technologies from robot Process Automation To artificial intelligence, and combines RPA with a plurality of AI technologies such as Optical Character Recognition (OCR), intelligent Character Recognition (ICR), process Mining (Process Mining), deep Learning (Deep Learning, DL), machine Learning (Machine Learning, ML), natural Language Processing (NLP), speech Recognition (Automatic Speech Recognition, ASR), speech synthesis (Text Speech, TTS), computer Vision (Computer Vision, CV), to create a business Process capable of considering, learning, and adapting To an end-To-end Process, including discovery, process flow, covering, to data collection by Automatic and continuous data collection, understanding data, and managing the meaning of the whole business Process using data and optimizing the whole business Process.

In a service scene of intelligent document processing, a complex service process may involve processing documents of several categories, and documents of different categories need to call different information extraction models to extract information, so as to perform subsequent service processing, such as information entry, ticket reimbursement and the like, based on extracted key information. For example, when an RPA robot is used to automatically process an email of an ordered product sent by a client a to a provider B, documents such as a contract and an invoice may be included in an email attachment, the RPA robot needs to invoke a contract extraction model to extract information from the contract documents, invoke a general multiple-bill model to extract information from the invoice documents, and then perform subsequent processing based on the extracted information. Therefore, the documents need to be classified by using the classification model, and then the information extraction model corresponding to the classification is called to extract the information of the documents, and then further business processing is realized. On the premise of ensuring the accuracy of the classification model, how to realize the rapid training of the classification model with less training data becomes a problem to be solved urgently.

Disclosure of Invention

The application provides a training method and a device for realizing an IA classification model by combining RPA and AI, which aim to solve the technical problems that a large amount of training data is required to be used for training and the training time of a classification model is long in a model training method for document classification in the related art.

An embodiment of a first aspect of the present application provides a method for training a classification model for implementing IA in combination with RPA and AI, where the method includes: obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining the category of each sample page; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification.

In some embodiments, training an initial classification model by using the coding vector and the category of each sample page as training data to obtain a target classification model for document classification includes: dividing training data into a training set and a verification set, wherein the training set comprises a plurality of coding vectors corresponding to first pages, the verification set comprises a plurality of coding vectors corresponding to second pages, and the first pages and the second pages are labeled by the categories; performing multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and the category to which the coding vector belongs so as to obtain candidate classification models after each round of training; and selecting a target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category of the second page.

In some embodiments, selecting a target classification model for document classification from the candidate classification models after each training cycle based on the coding vector corresponding to each second page and the category to which the coding vector belongs includes: for each round of trained candidate classification model, inputting the coding vector corresponding to each second page into the candidate classification model to obtain confidence coefficients of the second pages predicted by the candidate classification model and belonging to a plurality of preset classes, and determining a loss value corresponding to the candidate classification model based on the confidence coefficients of the second pages and the classes of the second pages; and selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification models after each round of training.

In some embodiments, obtaining the position coordinates of the at least one word included in each of the plurality of sample pages comprises: acquiring a plurality of sample documents sent by an RPA robot; for each sample page, acquiring Optical Character Recognition (OCR) information of the sample page; acquiring the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page; cutting words of the text content of each text segment to obtain at least one word included in each text segment; acquiring position coordinates of areas occupied by each text fragment; and acquiring the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

The training method for realizing the IA classification model by combining the RPA and the AI, provided by the embodiment of the application, is used for obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages and obtaining the category of each sample page; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

The embodiment of the second aspect of the present application provides a document classification method for implementing IA in combination with RPA and AI, and the method includes: acquiring a target document sent by an RPA robot, wherein the target document comprises at least one target page; acquiring the position coordinates of at least one word included in each target page; inputting the position coordinates of each word in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page; acquiring a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page; inputting the coding vector corresponding to each target page into a target classification model to obtain confidence coefficients of each target page belonging to a plurality of preset classes; the target classification model is obtained by training through the method in the embodiment of the first aspect; for each preset category, determining the average value of the confidence degrees of the target pages belonging to the preset categories; and determining the category of the target document from the preset categories based on the average value corresponding to each preset category.

The document classification method for realizing IA by combining RPA and AI provided by the embodiment of the application obtains a target document sent by an RPA robot, wherein the target document comprises at least one target page, obtains the position coordinates of at least one word included in each target page, inputs the position coordinates of each word in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page, obtains a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page, inputs the coding vector corresponding to each target page into a target classification model to obtain the confidence coefficient of each target page belonging to a plurality of preset categories, determines the average value of the confidence coefficient of each target page belonging to the preset categories for each preset category, and determines the category of the target document from each preset category based on the average value corresponding to each preset category. Therefore, the target documents are accurately classified by combining the target classification model obtained by quickly training with a small amount of training data and the pre-training document understanding model. And the target classification model is adopted to classify the target documents sent by the RPA robot by IA, so that the labor cost required by document classification is reduced, and the document classification efficiency is improved.

An embodiment of a third aspect of the present application provides a training apparatus for implementing an IA classification model in combination with an RPA and an AI, including: the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages and obtaining the category of each sample page; the first processing module is used for inputting the position coordinates of each word in each sample page into the pre-training document understanding model so as to obtain a coding vector corresponding to each word in each sample page; the second obtaining module is used for obtaining the coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and the training module is used for taking the coding vector corresponding to each sample page and the category thereof as training data and training the initial classification model to obtain a target classification model for document classification.

In some embodiments, a training module, comprising: the dividing unit is used for dividing the training data into a training set and a verification set, the training set comprises a plurality of coding vectors corresponding to the first pages, the verification set comprises a plurality of coding vectors corresponding to the second pages, and the first pages and the second pages are labeled by adopting the categories to which the first pages and the second pages belong; the training unit is used for carrying out multi-round training on the initial classification model based on the coding vector corresponding to each first page and the category to which the coding vector belongs so as to obtain candidate classification models after each round of training; and the selecting unit is used for selecting a target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category to which the coding vector belongs.

In some embodiments, the selecting unit comprises: the processing subunit is used for inputting the coding vectors corresponding to the second pages into the candidate classification model after each round of training so as to obtain confidence coefficients that the second pages predicted by the candidate classification model belong to a plurality of preset categories, and determining loss values corresponding to the candidate classification model based on the confidence coefficients that the second pages belong to the preset categories and the categories of the second pages; and the selecting subunit is used for selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification models after each round of training.

In some embodiments, the first obtaining module comprises: the first acquisition subunit is used for acquiring a plurality of sample documents sent by the RPA robot; the second acquisition subunit is used for acquiring Optical Character Recognition (OCR) recognition information of the sample pages for each sample page; the third acquiring subunit is configured to acquire text content of at least one text segment in the sample page based on the OCR identification information of the sample page; the word segmentation unit is used for segmenting words of the text content of each text segment to obtain at least one word included in each text segment; the fourth acquiring subunit is used for acquiring the position coordinates of the area occupied by each text segment; and the fifth acquiring subunit is used for acquiring the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

The training device for realizing the IA classification model by combining the RPA and the AI, which is provided by the embodiment of the application, acquires the position coordinates of at least one word included in each sample page in a plurality of sample pages and the category of each sample page; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

An embodiment of a fourth aspect of the present application provides a device for classifying documents in combination with an RPA and an AI to implement IA, where the device includes: the third acquisition module is used for acquiring a target document sent by the RPA robot, wherein the target document comprises at least one target page; the fourth acquisition module is used for acquiring the position coordinates of at least one word included in each target page; the second processing module is used for inputting the position coordinates of each word in each target page into the pre-training document understanding model so as to obtain a coding vector corresponding to each word in each target page; the fifth obtaining module is used for obtaining the coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page; the third processing module is used for inputting the coding vector corresponding to each target page into the target classification model so as to obtain the confidence coefficient that each target page belongs to a plurality of preset categories; the target classification model is obtained by training through the method in the embodiment of the first aspect; the first determining module is used for determining the average value of the confidence degrees of the preset categories of the target pages for each preset category; and the second determining module is used for determining the category of the target document from all the preset categories based on the average value corresponding to all the preset categories.

The document classification device for realizing IA by combining RPA and AI provided by the embodiment of the application obtains a target document sent by an RPA robot, wherein the target document comprises at least one target page, obtains a position coordinate of at least one word included in each target page, inputs the position coordinate of each word in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page, obtains a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page, inputs the coding vector corresponding to each target page into a target classification model to obtain confidence coefficients of each target page belonging to a plurality of preset categories, determines an average value of the confidence coefficients of each target page belonging to the preset categories for each preset category, and determines the category of the target document from each preset category based on the average value corresponding to each preset category. Therefore, the target documents are accurately classified by combining the target classification model obtained by fast training with a small amount of training data and the pre-training document understanding model. And by adopting the target classification model, IA classification is carried out on the target documents sent by the RPA robot, so that the labor cost required by document classification is reduced, and the document classification efficiency is improved.

An embodiment of a fifth aspect of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method according to the embodiment of the first aspect of the present application or to implement the method according to the embodiment of the second aspect of the present application.

An embodiment of a sixth aspect of the present application proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method according to an embodiment of the first aspect of the present application or implements a method according to an embodiment of the second aspect of the present application.

An embodiment of a seventh aspect of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the embodiment of the first aspect of the present application above, or implements a method as described in the embodiment of the second aspect of the present application above.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

FIG. 1 is a schematic flow chart diagram of a training method for implementing a classification model of IA in combination with RPA and AI according to a first embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for training a classification model for implementing IA in conjunction with RPA and AI according to a second embodiment of the present application;

FIG. 3 is a flowchart illustrating a document classification method for implementing IA in conjunction with RPA and AI according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a training apparatus for implementing a classification model of IA in combination with RPA and AI according to a fourth embodiment of the present application;

FIG. 5 is a schematic structural diagram of a document classifying device for realizing IA by combining RPA and AI according to a fifth embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a training method of a classification model for implementing IA in conjunction with RPA and AI or a document classification method for implementing IA in conjunction with RPA and AI according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application/disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application/disclosure, and should not be construed as limiting the present application/disclosure.

These and other aspects of the embodiments of the application/disclosure will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the present application/disclosed embodiments are disclosed in detail as being indicative of some of the ways in which the principles of the present application/disclosed embodiments may be employed, but it is understood that the scope of the embodiments is not limited thereby. Rather, the embodiments of the application/disclosure include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

It should be noted that, in the technical solutions of the present disclosure, the acquisition, storage, application, and the like of the related data all conform to the regulations of the related laws and regulations, and do not violate the common customs of the public order.

In the related art, a pre-trained document understanding model such as a layout model is usually used to understand documents, and then a classification model is used to classify the documents based on the understanding result. In order to realize document classification in different business scenarios, a pre-trained document understanding model and a classification model are jointly trained based on training data related to the business scenarios. That is, to implement document classification in a certain business scenario, not only the classification model but also the entire pre-trained document understanding model needs to be trained in a fine-tuning manner. The structure of the pre-training document understanding model is complex, fine tuning training of the pre-training document understanding model can be achieved only through more training data, and the whole training process consumes a long time.

The application provides a training method for realizing an IA classification model by combining RPA and AI, which can obtain the classification model for document classification without carrying out fine tuning training on a pre-training document understanding model. Wherein, the method comprises the following steps: obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining the category of each sample page; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

For the purpose of clearly explaining the embodiments of the present invention, terms related to the embodiments of the present invention will be explained first.

In the description of this application/disclosure, the term "plurality" refers to two or more.

In the description of the present application, the "RPA robot" refers to a software robot that can automatically perform business processing in conjunction with AI technology and RPA technology. The RPA robot has two characteristics of 'connector' and 'non-invasion', and extracts, integrates and communicates data of different systems in a non-invasive mode on the premise of not changing an information system by simulating an operation method of a human.

In the description of the present application, a "Document" is an electronic Document, which may be a Document in a PDF (Portable Document Format) Format obtained by scanning a paper Document, or a Document edited and formed in an intelligent device such as a computer or a mobile phone, and the present application does not limit this. The "target document" is a document to be classified. "Page" is a page included in a document. Such as an electronic contract document, may be one or more pages. "sample pages" are the pages included in the sample document used for model training. The "first page" is each page included in the training set after the training data is divided into the training set and the verification set. The "second page" is each page included in the verification set after the training data is divided into the training set and the verification set. The "target page" is a page included in a target document to be classified.

In the description of the present application, a "text segment" is a segment composed of a part of content in a page, where the text segment may be a text of one row or less than one row arranged horizontally, or a text of one column or less than one column arranged vertically, and the present application is not limited thereto.

In the description of the present application, a "word-corresponding coded vector" is a vector for characterizing feature information of a word, where the feature information of the word includes, for example, a position of the word in a page. The "encoding vector corresponding to the sample page" is a vector for characterizing feature information of the sample page, where the feature information of the sample page includes, for example, positions of all words included in the sample page in the page. The "encoding vector corresponding to the target page" is a vector used to characterize feature information of the target page, where the feature information of the target page includes, for example, positions of all words included in the target page in the page.

In the description of the present application, the "pre-trained document understanding model" is a model used for understanding a document after pre-training, such as a layout lm model (a pre-trained model for processing multimodal information (text and layout information)), a layout lm2.0 model, and the like.

In the description of the present application, the "preset category" is a category to which documents created in advance according to needs may belong, and may be set to, for example, a ticket category, a contract category, or the like. The category of the target document is obtained by predicting the category of the target document to be classified by using a target classification model obtained by training. The "category to which the sample page belongs" is a category to which the sample page actually belongs, such as a bill category, a contract category, and the like.

In the description of the present application, the "classification model" is an AI neural network model for classifying documents, and the structure thereof may be set as needed. The input of the classification model is a coding vector corresponding to a page in a document, and the output of the classification model is a predicted category to which the corresponding page belongs, specifically, a confidence that the page belongs to one or more preset categories.

In the description of the present application, "confidence" may indicate how likely a certain page belongs to a certain preset category. For example, the confidence level that the target page belongs to the preset category a indicates the high or low probability that the target page belongs to the preset category a.

In the description of the present application, the "average value of the confidence degrees that each target page belongs to the preset category" is a value obtained by averaging the confidence degrees that each target page belongs to the preset category.

In the description of the present application, a "document processing platform" is an intelligent automation platform for intelligent processing of documents. Among them, intelligent Document Processing (IDP) is one of the core capabilities of an intelligent automation platform. The Intelligent Document Processing (IDP) is a new generation of automation technology that identifies, classifies, extracts elements, checks, compares, corrects, and the like, various documents based on AI technologies such as Optical Character Recognition (OCR), computer Vision (CV), natural Language Processing (NLP), and Knowledge Graph (KG), and helps enterprises to realize intellectualization and automation of Document Processing work.

In the description of the present application, "OCR (Optical Character Recognition)", specifically refers to a process in which an electronic device checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-and-white dot matrix in an optical mode aiming at printed characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

A training method of a classification model for realizing IA in combination with RPA and AI, a document classification method for realizing IA in combination with RPA and AI, an apparatus, an electronic device, and a storage medium according to embodiments of the present application/disclosure are described below with reference to the accompanying drawings.

First, a method for training a classification model for realizing IA by combining RPA and AI in the embodiment of the present application will be described with reference to the drawings.

Fig. 1 is a flowchart of a training method for implementing the classification model of IA in combination with RPA and AI according to the first embodiment of the present application. As shown in fig. 1, the method may include the steps of:

step 101, obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining the category of each sample page.

It should be noted that, the training method for implementing the IA classification model by combining the RPA and the AI in the embodiment of the present application may be executed by a training device for implementing the IA classification model by combining the RPA and the AI. The training device that implements the classification model of IA in conjunction with RPA and AI will be referred to as the training device below. The training device may be implemented by software and/or hardware, and the training device may be an electronic device, or may also be configured in an electronic device, so as to implement training of a classification model for document classification. The electronic device may include, but is not limited to, a terminal device, a server, and the like, and the embodiment does not specifically limit the electronic device.

The words included in the sample page are words (i.e., tokens) obtained by segmenting the text segments in the sample page. The text segments can be segmented based on a preset word list and rules. For example, for chinese, it is possible to cut word by word, for example, for the text fragment "1" 23, "the words obtained by word segmentation are" 1 "and" 23, "and for the text fragment" three-fold, "the words obtained by word segmentation are" three-fold "and" one-fold "; for english, sub-words can be segmented into stems and affixes, for example, for the text segment "work", the words obtained by segmenting are "work" and "ing".

The position coordinates of the word are used to indicate the position of the word in the page (in this embodiment, the sample page). For example, the position coordinates of the word may include x-axis coordinates and y-axis coordinates of the word in a coordinate system with the top left corner of the sample page as the origin.

For each sample page, under the condition that the sample page includes a plurality of words, in order to reduce the amount of computation, only the position coordinates of a limited number of words may be acquired for performing subsequent model training. For example, assuming that the number of words is set to 128 in advance, position coordinates of 128 words can be obtained at maximum for each sample page.

And 102, inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page.

In some embodiments, for each word in each sample page, the position coordinates of the word may be input into a pre-training document understanding model, and the pre-training document understanding model may output the coding vector of the word, so that the training device may obtain the coding vector corresponding to the word.

And 103, acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page.

In some embodiments, for each sample page, in a case that a word is included in the sample page, the coding vector corresponding to the word may be determined as the coding vector corresponding to the sample page; in a case where a plurality of words are included in the sample page, an average value of the encoding vectors corresponding to the plurality of words may be determined, and the average value may be used as the encoding vector corresponding to the sample page.

And 104, taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification.

In some embodiments, the initial classification model may be constructed in advance. If the coding length of the coding vector corresponding to each sample page is L and the number of the preset classes is M, the initial classification model may be an L × M dimensional matrix. Wherein L is an integer greater than 1 and M is an integer greater than 0. Inputting the 1 × l-dimensional encoding vector corresponding to a sample page into the classification model, so as to obtain a 1 × M-dimensional vector, where each element in the 1 × M-dimensional vector represents a confidence level that the sample page belongs to M preset classes.

And then, the coding vector corresponding to each sample page is used as the input of the classification model, the category of each sample page is used as a label, and the initial classification model is supervised and trained to obtain the target classification model.

The target classification model can be used for classifying the documents. Therefore, in an actual business scene, the target classification model can be used for classifying the documents, then the information extraction model of the corresponding category is called for extracting the information of the documents, and further business processing is realized based on the extracted information.

It can be understood that, in the embodiment of the present application, the pre-trained document understanding model is used as a universal encoder in different service scenarios to be used alone to obtain the encoding vector corresponding to the sample page, and the pre-trained document understanding model is not trained in the training process, but only the classification model is trained. Due to the classification model, the method does not need to be used for generating the coding vectors corresponding to all sample pages, has a simple structure, can obtain the classification model capable of accurately classifying the documents only by using a small amount of training data, and has short time consumption in the training process. Therefore, on the premise of not influencing the accuracy of document classification, the training speed of the classification model is improved, and the data volume required in the training process is reduced. In addition, the classification model is simple in structure and small in occupied space, so that deployment is convenient.

In summary, the training method for realizing the IA classification model by combining the RPA and the AI provided in the embodiment of the present application obtains the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtains the category to which each sample page belongs; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

The method for training the classification model to implement IA in conjunction with RPA and AI provided in the embodiment of the present application is further described with reference to fig. 2.

Fig. 2 is a flowchart of a training method for implementing a classification model of IA in combination with RPA and AI according to a second embodiment of the present application, as shown in fig. 2, the method includes:

step 201, obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining the category of each sample page.

In some embodiments, the position coordinates of at least one word included in each sample page of the plurality of sample pages may be obtained by: for each sample page, acquiring Optical Character Recognition (OCR) information of the sample page; acquiring text content of at least one text fragment in the sample page based on OCR (optical character recognition) information of the sample page; cutting words of the text content of each text segment to obtain at least one word included in each text segment; acquiring position coordinates of the area occupied by each text segment; and acquiring the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

The area occupied by each text segment is usually a rectangle, and the position coordinates of the area occupied by the text segment may include position coordinates of a vertex at the upper left corner and a vertex at the lower right corner of the area occupied by the text segment, or position coordinates of a vertex at the upper right corner and a vertex at the lower left corner. The position coordinates of the word may also include position coordinates of a top left corner vertex and a bottom right corner vertex of the area occupied by the word, or position coordinates of a top right corner vertex and a bottom left corner vertex.

Specifically, for each sample page, an OCR recognition technology may be adopted in advance to perform recognition so as to obtain OCR recognition information of the sample page, and then the text content of at least one text segment is obtained from the OCR recognition information of the sample page, and then the text content of each text segment is cut into words based on a preset word list and rules, so as to obtain at least one word included in each text segment. Wherein, chinese can be divided word by word; for english, it can be segmented into sub-words of stems and affixes.

In addition, for each sample page, the width and the height of the sample page and the position of each text segment in the sample page may be obtained, and then based on the width and the height of the sample page and the position of each text segment in the sample page, position coordinates of an area occupied by each text segment, such as position coordinates of a vertex at an upper left corner and a vertex at a lower right corner (or position coordinates of a vertex at an upper right corner and a vertex at a lower left corner), are determined.

And then for each sample page, the position coordinates of each word can be obtained based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

The following describes a process of obtaining the position coordinates of each word in a text segment based on the position coordinates of the area occupied by the text segment in a sample page and the position of each word in the text segment. Assuming that the upper left corner of the sample page is the origin of the coordinate system, the position coordinates of the region occupied by the text segment include the x-axis coordinates and y-axis coordinates (x 1, y 1) of the vertex of the upper left corner, and the x-axis coordinates and y-axis coordinates (x 2, y 2) of the vertex of the lower right corner. And the position coordinates of the words comprise the x-axis coordinates and the y-axis coordinates (x 3, y 3) of the top left vertex of the area occupied by the words and the x-axis coordinates and the y-axis coordinates (x 4, y 4) of the bottom right vertex.

Since the text segment may be a line or less than a line of text arranged horizontally or a column or less than a column of text arranged vertically, it may be determined whether the text segment is arranged horizontally or vertically. Whether y2-y1 corresponding to the text segment is smaller than A (x 2-x 1) can be judged, wherein A can be set according to needs, such as 1.5. In the case that y2-y1 is smaller than A (x 2-x 1), the text segments can be determined to be in a horizontal arrangement; in the case where y2-y1 is not smaller than a (x 2-x 1), it can be determined that the text segments are vertically aligned.

For a transversely arranged text segment, the width of each word can be obtained based on the length proportion of each word in the text segment and the value of x2-x 1. And regarding the first word on the left side in the text segment, taking x1 as the x-axis coordinate x3 of the vertex on the upper left corner of the word, and taking x3+ the width of the first word as the x-axis coordinate x4 of the vertex on the lower right corner of the first word. For other words in the text segment, x1 may be added to the accumulated width of all words to the left of the word as x-axis coordinate x3 of the top left corner vertex of the word, and x3+ the width of the word as x-axis coordinate x4 of the bottom right corner vertex of the word. In addition, for each word in the text passage, y1 may be used as the y-axis coordinate y3 of the top left corner of each word, and y2 may be used as the y-axis coordinate y4 of the bottom right corner of each word.

For a vertically arranged text segment, the height of each word can be obtained based on the length proportion of each word in the text segment and the value of y2-y 1. And regarding the first word at the upper side in the text segment, taking y1 as the y-axis coordinate y3 of the vertex at the upper left corner of the word, and taking y3+ the height of the first word as the y-axis coordinate y4 of the vertex at the lower right corner of the first word. For other words in the text segment, y1 may be added to the accumulated heights of all words above the word to serve as the y-axis coordinate y3 of the top left corner vertex of the word, and y3+ the height of the word to serve as the y-axis coordinate y4 of the bottom right corner vertex of the word. For each word in the text passage, x1 may be used as the x-axis coordinate x3 of the top left corner of each word, and x2 may be used as the x-axis coordinate x4 of the bottom right corner of each word.

For example, assume that a is 1.5, the text segment is "1 23", and the words obtained by segmenting the text content of the text segment are "1" and "23". The position coordinates of the region occupied by the text segment "1 23" include: (x 1, y 1) = (2, 2) and (x 2, y 2) = (8, 4), since y2-y1 is less than 1.5 (x 2-x 1), it can be determined that the text fragment "1" is horizontally aligned. Since the length ratio of "1" to "23" is 1. In the above manner, the position coordinates at which "1" can be determined include: the position coordinates of (x 3, y 3) = (2, 2), (x 4, y 4) = (4, 4), "23" include: (x 3, y 3) = (4, 2), (x 4, y 4) = (8, 4).

In some embodiments, the plurality of sample documents obtained by the training device may be transmitted by the RPA robot. That is, for each sample page, before acquiring the OCR recognition information of the optical character recognition of the sample page, the method may further include: a plurality of sample documents sent by the RPA robot are obtained.

For example, the training device may be configured in the document processing platform, and the document processing platform may provide an upload interface, so that when a user needs to train and generate a target classification model, each sample document may be uploaded through the upload interface based on the RPA robot, and the training device in the document processing platform may obtain a plurality of sample documents uploaded by the RPA robot. Therefore, the plurality of sample pages are uploaded to the document processing platform by the RPA robot, so that the training device can automatically acquire the sample pages by combining the RPA robot, and the labor cost for training the classification model is reduced.

Step 202, inputting the position coordinates of each word in each sample page into the pre-training document understanding model to obtain the corresponding coding vector of each word in each sample page.

Step 203, obtaining the coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page.

The specific implementation process and principle of steps 202-203 may refer to the description of the above embodiments, and are not described herein again.

Step 204, dividing the training data into a training set and a verification set, wherein the training set comprises a plurality of coding vectors corresponding to the first pages, the verification set comprises a plurality of coding vectors corresponding to the second pages, and the first pages and the second pages are labeled by the categories.

The number ratio of the first page included in the training set to the second page included in the verification set may be arbitrarily set as required, for example, 4.

Step 205, performing multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and the category to which the first page belongs, so as to obtain candidate classification models after each round of training.

The number of rounds of training the initial classification model may be arbitrarily set according to needs, which is not limited in this application.

In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, the training set may be divided into N sub-training sets, and based on a first sub-training set, a round of iterative training is performed on an initial classification model to obtain a candidate classification model after the round of training, then based on a next sub-training set, a round of iterative training is performed on a candidate classification model after the first round of training to obtain a candidate classification model after the round of training, and then based on the next sub-training set, a round of iterative training is performed on a candidate classification model after the second round of training to obtain a candidate classification model after the round of training, and so on, thereby obtaining a candidate classification model after N rounds of training based on the N sub-training sets.

In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, a round of iterative training may also be performed on an initial classification model based on a training set to obtain a candidate classification model after the round of training, then, based on the training set, a round of iterative training is performed on a candidate classification model after a first round of training to obtain a candidate classification model after the round of training, then, based on the training set, a round of iterative training is performed on a candidate classification model after a second round of training to obtain a candidate classification model after the round of training, and so on, so as to obtain a candidate classification model after N rounds of training based on the training set.

And step 206, selecting a target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category of the second page.

In some embodiments, step 206 may be implemented by: for each round of trained candidate classification model, inputting the coding vector corresponding to each second page into the candidate classification model to obtain confidence coefficients of the second pages predicted by the candidate classification model and belonging to a plurality of preset classes, and determining a loss value corresponding to the candidate classification model based on the confidence coefficients of the second pages and the classes of the second pages; and selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification models after each round of training.

In some embodiments, the confidence that each second page predicted by a candidate classification model belongs to a plurality of preset categories and the category to which each second page belongs may be substituted into the cross entropy loss function to determine the loss value corresponding to the candidate classification model.

Wherein, the cross entropy loss function can be shown in formula (1).

Wherein L is _ce The loss value is indicated. N represents the number of second pages included in the verification set. C denotes the number of preset categories, also called category number.

Is a label, is a symbolic function, and when the category of the ith second page is c,

is 1, otherwise

Is 0.

Representing the confidence (also called prediction probability) that the ith second page belongs to the preset category c.

In some embodiments, the candidate classification model with the lowest loss value in the candidate classification models after each round of training may be determined as the target classification model. Thus, the model with the highest prediction accuracy among the candidate classification models after each round of training can be determined as the target classification model.

To sum up, the training method for implementing the IA classification model by combining the RPA and the AI provided in the embodiment of the present application obtains the position coordinates of at least one word included in each sample page in the multiple sample pages, obtains the category to which each sample page belongs, inputs the position coordinates of each word in each sample page into the pre-training document understanding model to obtain the coding vector corresponding to each word in each sample page, obtains the coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page, divides the training data into the training set including the coding vectors corresponding to the multiple first pages and the verification set including the coding vectors corresponding to the multiple second pages, marks each first page and each second page by using the category to which each sample page corresponds, performs multiple rounds of training on the initial classification model based on the coding vectors corresponding to each first page and the category to which each first page belongs to obtain the candidate classification model after each round of training, and selects the target classification model for classifying the candidate documents from the candidate classification models after each round of training based on the coding vectors corresponding to each second page and the category to which each first page corresponds. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

Based on the above embodiment, the present application further provides a document classification method for implementing IA by combining RPA and AI. The document classification method for implementing IA by combining RPA and AI provided in the embodiment of the present application is described below with reference to fig. 3.

Fig. 3 is a flowchart of a document classification method for implementing IA in conjunction with RPA and AI according to a third embodiment of the present application, as shown in fig. 3, the method includes:

step 301, acquiring a target document sent by the RPA robot, wherein the target document comprises at least one target page.

It should be noted that the document classification method for implementing IA by combining RPA and AI according to the embodiment of the present application may be executed by a document classification device for implementing IA by combining RPA and AI. The document classification device for implementing IA in combination with RPA and AI may be implemented by software and/or hardware, and may be an electronic device, or may also be configured in an electronic device, so as to implement classification of documents. The electronic device may include, but is not limited to, a terminal device, a server, and the like, and the embodiment does not specifically limit the electronic device.

In some embodiments, a document classification device that implements IA in conjunction with RPA and AI may be configured in a document processing platform, and the document processing platform may provide an upload interface. Therefore, when a user needs to classify a certain target document, the target document can be uploaded through the uploading interface based on the RPA robot, and the document classification device which combines the RPA and the AI to realize the IA in the document processing platform can acquire the target document uploaded by the RPA robot.

Step 302, obtaining the position coordinates of at least one word included in each target page.

The words included in the target page are words (i.e., tokens) obtained by segmenting the text segments in the target page. The text segments can be segmented based on a preset word list and rules.

The position coordinates of the word are used to indicate the position of the word in the page (in this embodiment, the target page). For example, the position coordinates of the word may include x-axis coordinates and y-axis coordinates of the word in a coordinate system with the upper left corner of the target page as an origin.

For each target page, in the case that the target page includes a plurality of words, in order to reduce the amount of computation, only the position coordinates of a limited number of words may be acquired for subsequent document classification. For example, assuming that the number of words is set to 128 in advance, position coordinates of 128 words can be acquired at maximum for each target page.

The manner of obtaining the position coordinates of the at least one word included in each target page may refer to the manner of obtaining the position coordinates of the at least one word included in the sample page in the above embodiments, and details are not repeated here.

Step 303, inputting the position coordinates of each word in each target page into the pre-training document understanding model to obtain the coding vector corresponding to each word in each target page.

In some embodiments, for each word in each target page, the position coordinate of the word may be input into a pre-trained document understanding model, and the pre-trained document understanding model may output the coding vector of the word, so that the document classification device that implements IA in combination with RPA and AI may obtain the coding vector corresponding to the word.

And 304, acquiring a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page.

In some embodiments, for each target page, in a case that a word is included in the target page, the coding vector corresponding to the word may be determined as the coding vector corresponding to the target page; in the case that the target page includes a plurality of words, an average value of the encoding vectors corresponding to the plurality of words may be determined, and the average value may be used as the encoding vector corresponding to the target page.

Step 305, inputting the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to a plurality of preset categories.

The target classification model is obtained by training through a training method of the classification model for realizing IA by combining RPA and AI shown in any embodiment.

For any target page, the confidence levels of the target page belonging to a plurality of preset categories can indicate the probability of the target page belonging to each preset category.

In some embodiments, by using the method for training a classification model for implementing IA by combining RPA and AI shown in any of the above embodiments, the obtained target classification model through training may be deployed in a document processing platform, so that a document classification device for implementing IA by combining RPA and AI may input a coding vector corresponding to each target page into the target classification model deployed in the document processing platform, so as to obtain a confidence that each target page belongs to a plurality of preset categories.

Step 306, for each preset category, determining an average value of confidence degrees that each target page belongs to the preset category.

In some embodiments, the confidence levels of the target pages belonging to the same preset category may be summed and then averaged, so as to obtain an average value of the confidence levels of the target pages belonging to the same preset category.

And 307, determining the category of the target document from the preset categories based on the average value corresponding to each preset category.

In some embodiments, the preset category with the largest corresponding average value in the preset categories may be determined as the category to which the target document belongs.

For example, assume that the preset categories include category 1 and category 2. The target document includes 10 target pages. Then, through step 305 and the previous steps, the confidence that each of the 10 target pages belongs to category 1 and the confidence that each of the target pages belongs to category 2 can be obtained. Furthermore, the confidence degrees that 10 target pages belong to the category 1 can be summed and then averaged to obtain an average value of the confidence degrees that 10 target pages belong to the category 1, and the confidence degrees that 10 target pages belong to the category 2 can be summed and then averaged to obtain an average value of the confidence degrees that 10 target pages belong to the category 2. Assuming that the average of the confidences that 10 target pages belong to the category 1 is greater than the average of the confidences that 10 target pages belong to the category 2, the category to which the target document belongs can be determined to be the category 1.

The pre-training document understanding model in the embodiment of the application is universal for each service scene, does not need training, and can be used as a universal encoder in different service scenes to be used independently to obtain the encoding vector corresponding to the target page. The target classification model does not need to be used for generating the coding vectors corresponding to all target pages, the structure is simple, only a small amount of training data needs to be used for training in the training process, the time consumption of the training process is short, the prediction effect of the target classification model obtained through training is not affected, and the accurate classification of the documents can still be realized.

To sum up, the document classification method for implementing IA by combining RPA and AI provided in the embodiment of the present application obtains a target document sent by an RPA robot, where the target document includes at least one target page, obtains a position coordinate of at least one word included in each target page, inputs the position coordinate of each word in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page, obtains a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page, inputs the coding vector corresponding to each target page into a target classification model to obtain confidence coefficients of each target page belonging to multiple preset categories, determines, for each preset category, an average value of the confidence coefficients of each target page belonging to a preset category, and determines, based on the average value corresponding to each preset category, a category to which the target document belongs from each preset category. Therefore, the target documents are accurately classified by combining the target classification model obtained by fast training with a small amount of training data and the pre-training document understanding model. And by adopting the target classification model, IA classification is carried out on the target documents sent by the RPA robot, so that the labor cost required by document classification is reduced, and the document classification efficiency is improved.

In order to implement the above embodiments, the present application further provides a training device for implementing the IA classification model by combining RPA and AI. Fig. 4 is a schematic structural diagram of a training apparatus for implementing the classification model of IA in combination with RPA and AI according to a fourth embodiment of the present application.

As shown in fig. 4, the training apparatus 400 for implementing the classification model of IA in combination with RPA and AI includes: a first acquisition module 401, a first processing module 402, a second acquisition module 403, and a training module 404.

The first obtaining module 401 is configured to obtain a position coordinate of at least one word included in each sample page in the multiple sample pages, and obtain a category to which each sample page belongs;

a first processing module 402, configured to input the position coordinates of each word in each sample page into a pre-training document understanding model, so as to obtain a coding vector corresponding to each word in each sample page;

a second obtaining module 403, configured to obtain, based on the coding vector corresponding to each word in each sample page, a coding vector corresponding to each sample page;

the training module 404 is configured to train the initial classification model by using the coding vector and the category corresponding to each sample page as training data to obtain a target classification model for document classification.

It should be noted that the training apparatus 400 for implementing the IA classification model by combining the RPA and the AI according to the embodiment of the present application may perform the training method for implementing the IA classification model by combining the RPA and the AI provided in the foregoing embodiment. The training apparatus 400 for implementing the classification model of IA in combination with RPA and AI may be implemented by software and/or hardware, and may be an electronic device, or may also be configured in the electronic device, so as to implement training of the classification model for document classification. The electronic device may include, but is not limited to, a terminal device, a server, and the like, and the embodiment does not specifically limit the electronic device.

In one embodiment of the present application, training module 404 includes:

the dividing unit is used for dividing the training data into a training set and a verification set, the training set comprises a plurality of coding vectors corresponding to the first pages, the verification set comprises a plurality of coding vectors corresponding to the second pages, and the first pages and the second pages are labeled by adopting the categories to which the first pages and the second pages belong;

the training unit is used for carrying out multi-round training on the initial classification model based on the coding vector corresponding to each first page and the category to which the coding vector belongs so as to obtain candidate classification models after each round of training;

and the selecting unit is used for selecting a target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category to which the coding vector belongs.

In one embodiment of the present application, the selecting unit includes:

the processing subunit is used for inputting the coding vector corresponding to each second page into the candidate classification model for each round of trained candidate classification model to obtain confidence coefficients that each second page predicted by the candidate classification model belongs to a plurality of preset categories, and determining a loss value corresponding to the candidate classification model based on the confidence coefficients that each second page belongs to the plurality of preset categories and the category to which each second page belongs;

and the selecting subunit is used for selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification models after each round of training.

In an embodiment of the present application, the first obtaining module 401 includes:

the device comprises a first acquisition subunit, a second acquisition subunit and a third acquisition subunit, wherein the first acquisition subunit is used for acquiring a plurality of sample documents sent by the RPA robot;

the second acquisition subunit is used for acquiring Optical Character Recognition (OCR) recognition information of the sample pages for each sample page;

the third acquiring subunit is used for acquiring the text content of at least one text segment in the sample page based on the OCR identification information of the sample page;

the word cutting unit is used for cutting words of the text content of each text segment to obtain at least one word included in each text segment;

the fourth acquiring subunit is used for acquiring the position coordinates of the area occupied by each text segment;

and the fifth acquiring subunit is used for acquiring the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

It should be noted that the foregoing explanation of the embodiment of the method for training a classification model for implementing IA by combining RPA and AI is also applicable to the training apparatus for implementing an IA classification model by combining RPA and AI in this embodiment, and details not disclosed in the embodiment of the training apparatus for implementing an IA classification model by combining RPA and AI in this application are not repeated here.

To sum up, the training device for realizing the IA classification model by combining the RPA and the AI according to the embodiment of the present application obtains the position coordinates of at least one word included in each sample page of the plurality of sample pages, and obtains the category to which each sample page belongs; inputting the position coordinates of each word in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page; acquiring a coding vector corresponding to each sample page based on the coding vector corresponding to each word in each sample page; and taking the coding vector corresponding to each sample page and the category thereof as training data, and training the initial classification model to obtain a target classification model for document classification. Therefore, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the data volume required in the training process is reduced.

In order to implement the above embodiments, the present application further provides a document classification apparatus for implementing IA by combining RPA and AI. Fig. 5 is a schematic structural diagram of a document classification device for implementing IA by combining RPA and AI according to a fifth embodiment of the present application.

As shown in fig. 5, the apparatus 500 for classifying documents that implements IA by combining RPA and AI includes: a third obtaining module 501, a fourth obtaining module 502, a second processing module 503, a fifth obtaining module 504, a third processing module 505, a first determining module 506, and a second determining module 507.

The third obtaining module 501 is configured to obtain a target document sent by the RPA robot, where the target document includes at least one target page;

a fourth obtaining module 502, configured to obtain a position coordinate of at least one word included in each target page;

the second processing module 503 is configured to input the position coordinates of each word in each target page into the pre-training document understanding model, so as to obtain a coding vector corresponding to each word in each target page;

a fifth obtaining module 504, configured to obtain, based on the coding vector corresponding to each word in each target page, a coding vector corresponding to each target page;

a third processing module 505, configured to input the coding vector corresponding to each target page into the target classification model, so as to obtain confidence levels that each target page belongs to multiple preset categories; the target classification model is obtained by training through the method in the embodiment of the first aspect;

a first determining module 506, configured to determine, for each preset category, an average value of confidence levels that each target page belongs to the preset category;

the second determining module 507 is configured to determine a category to which the target document belongs from each preset category based on an average value corresponding to each preset category.

To sum up, the document classification device for realizing IA by combining RPA and AI according to the embodiment of the present application obtains a target document sent by an RPA robot, where the target document includes at least one target page, obtains a position coordinate of at least one word included in each target page, inputs the position coordinate of each word in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page, obtains a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page, inputs the coding vector corresponding to each target page into a target classification model to obtain confidence coefficients of each target page belonging to multiple preset categories, determines, for each preset category, an average value of the confidence coefficients of each target page belonging to the preset category, and determines, based on the average value corresponding to each preset category, a category to which the target document belongs from each preset category. Therefore, the target documents are accurately classified by combining the target classification model obtained by fast training with a small amount of training data and the pre-training document understanding model. And the target classification model is adopted to classify the target documents sent by the RPA robot by IA, so that the labor cost required by document classification is reduced, and the document classification efficiency is improved.

In order to implement the foregoing embodiments, an electronic device is further provided in an embodiment of the present application, and includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a training method of a classification model for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments, or implements a document classification method for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments.

In order to implement the foregoing embodiments, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a classification model for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments, or implements a method for classifying a document for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments.

In order to implement the foregoing embodiments, the present application further provides a computer program product, wherein when being executed by an instruction processor in the computer program product, the computer program product implements a method for training a classification model for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments, or implements a method for classifying a document for implementing IA in combination with RPA and AI according to any one of the foregoing method embodiments.

FIG. 6 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application. The electronic device 10 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the electronic device 10 is embodied in the form of a general purpose computing device. The components of the electronic device 10 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro Channel Architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 10 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 10 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 10 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination of which may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

The electronic device 10 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the electronic device 10, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 10 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, the electronic device 10 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network such as the Internet) via the Network adapter 20. As shown in FIG. 6, the network adapter 20 communicates with the other modules of the electronic device 10 via the bus 18. It should be understood that although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with the electronic device 10, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by executing programs stored in the memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A training method for realizing a classification model of an intelligent automation IA by combining Robot Process Automation (RPA) and Artificial Intelligence (AI), which is characterized by comprising the following steps:

obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining the category of each sample page;

inputting the position coordinates of the words in each sample page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each sample page;

obtaining a coding vector corresponding to each sample page based on a coding vector corresponding to each word in each sample page;

and taking the coding vector corresponding to each sample page and the category thereof as training data, and training an initial classification model to obtain a target classification model for document classification.

2. The method of claim 1, wherein the training an initial classification model by using the code vector and the class corresponding to each sample page as training data to obtain a target classification model for document classification comprises:

dividing the training data into a training set and a verification set, wherein the training set comprises coding vectors corresponding to a plurality of first pages, the verification set comprises coding vectors corresponding to a plurality of second pages, and each first page and each second page are labeled by adopting the category to which the first page and the second page belong;

performing multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and the category to which the coding vector belongs to obtain candidate classification models after each round of training;

and selecting the target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category of the second page.

3. The method according to claim 2, wherein the selecting the target classification model for document classification from the candidate classification models after each training based on the coding vector and the category corresponding to each second page comprises:

inputting the coding vector corresponding to each second page into the candidate classification model after each round of training to obtain the confidence coefficient of each second page predicted by the candidate classification model, wherein the second page belongs to a plurality of preset categories, and determining the loss value corresponding to the candidate classification model based on the confidence coefficient of each second page belonging to the plurality of preset categories and the category of each second page;

and selecting the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification models after each round of training.

4. The method of claim 1, wherein obtaining position coordinates of at least one word included in each of the plurality of sample pages comprises:

obtaining the plurality of sample documents sent by the RPA robot;

for each sample page, acquiring Optical Character Recognition (OCR) information of the sample page;

acquiring the text content of at least one text fragment in the sample page based on the OCR (optical character recognition) information of the sample page;

performing word segmentation on the text content of each text segment to obtain at least one word included in each text segment;

acquiring the position coordinates of the area occupied by each text fragment;

and acquiring the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

5. A document classification method for realizing IA by combining RPA and AI, which is characterized in that the method comprises the following steps:

acquiring a target document sent by an RPA robot, wherein the target document comprises at least one target page;

acquiring the position coordinates of at least one word included in each target page;

inputting the position coordinates of the words in each target page into a pre-training document understanding model to obtain a coding vector corresponding to each word in each target page;

acquiring a coding vector corresponding to each target page based on the coding vector corresponding to each word in each target page;

inputting the coding vector corresponding to each target page into a target classification model to obtain confidence coefficients of each target page belonging to a plurality of preset categories; wherein the object classification model is trained by the method of any one of claims 1-4;

for each preset category, determining the average value of confidence degrees of the target pages belonging to the preset category;

and determining the category of the target document from each preset category based on the average value corresponding to each preset category.

6. An apparatus for training a classification model for implementing IA in conjunction with RPA and AI, the apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring the position coordinates of at least one word included in each sample page in a plurality of sample pages and acquiring the category of each sample page;

the first processing module is used for inputting the position coordinates of the words in the sample pages into a pre-training document understanding model so as to obtain coding vectors corresponding to the words in the sample pages;

a second obtaining module, configured to obtain, based on a coding vector corresponding to each word in each sample page, a coding vector corresponding to each sample page;

and the training module is used for taking the coding vector corresponding to each sample page and the category thereof as training data and training the initial classification model to obtain a target classification model for document classification.

7. The apparatus of claim 6, wherein the training module comprises:

the dividing unit is used for dividing the training data into a training set and a verification set, wherein the training set comprises coding vectors corresponding to a plurality of first pages, the verification set comprises coding vectors corresponding to a plurality of second pages, and each first page and each second page are labeled by adopting the category to which the first page and each second page belong;

and the selecting unit is used for selecting the target classification model for document classification from the candidate classification models after each round of training based on the coding vector corresponding to each second page and the category to which the coding vector belongs.

8. The apparatus of claim 7, wherein the selecting unit comprises:

the processing subunit is configured to, for the candidate classification model after each round of training, input the coding vector corresponding to each second page into the candidate classification model to obtain confidence levels that the second pages predicted by the candidate classification model belong to multiple preset categories, and determine a loss value corresponding to the candidate classification model based on the confidence levels that the second pages belong to the multiple preset categories and the categories to which the second pages belong;

and the selecting subunit is used for selecting the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.

9. The apparatus of claim 6, wherein the first obtaining module comprises:

a first acquiring subunit, configured to acquire the plurality of sample documents sent by the RPA robot;

the third obtaining subunit is configured to obtain text content of at least one text segment in the sample page based on the OCR identification information of the sample page;

the word segmentation unit is used for segmenting words of the text content of each text segment to obtain at least one word included in each text segment;

and a fifth obtaining subunit, configured to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.

10. A document classification apparatus for implementing IA in conjunction with RPA and AI, the apparatus comprising:

the third acquisition module is used for acquiring a target document sent by the RPA robot, wherein the target document comprises at least one target page;

the fourth acquisition module is used for acquiring the position coordinates of at least one word included in each target page;

the second processing module is used for inputting the position coordinates of the words in each target page into a pre-training document understanding model so as to obtain a coding vector corresponding to each word in each target page;

a fifth obtaining module, configured to obtain, based on the coding vector corresponding to each word in each target page, a coding vector corresponding to each target page;

the third processing module is used for inputting the coding vector corresponding to each target page into a target classification model so as to obtain the confidence coefficient that each target page belongs to a plurality of preset categories; wherein the object classification model is trained by the method of any one of claims 1-4;

the first determining module is used for determining the average value of the confidence degrees of the target pages belonging to the preset categories for each preset category;

and the second determining module is used for determining the category of the target document from each preset category based on the average value corresponding to each preset category.

11. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-4 or implementing the method of claim 5 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 4, or carries out the method of claim 5.

13. A computer program product comprising a computer program, characterized in that the computer program realizes the method according to any one of claims 1-4 or realizes the method according to claim 5 when executed by a processor.