WO2024055864A1 - Training method and apparatus for implementing ia classification model using rpa and ai - Google Patents

Training method and apparatus for implementing ia classification model using rpa and ai Download PDF

Info

Publication number
WO2024055864A1
WO2024055864A1 PCT/CN2023/116770 CN2023116770W WO2024055864A1 WO 2024055864 A1 WO2024055864 A1 WO 2024055864A1 CN 2023116770 W CN2023116770 W CN 2023116770W WO 2024055864 A1 WO2024055864 A1 WO 2024055864A1
Authority
WO
WIPO (PCT)
Prior art keywords
page
target
training
classification model
word
Prior art date
Application number
PCT/CN2023/116770
Other languages
French (fr)
Chinese (zh)
Inventor
段沛宸
Original Assignee
北京来也网络科技有限公司
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京来也网络科技有限公司, 来也科技(北京)有限公司 filed Critical 北京来也网络科技有限公司
Publication of WO2024055864A1 publication Critical patent/WO2024055864A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques

Definitions

  • the present disclosure relates to the technical fields of robotic process automation and artificial intelligence, and in particular to a training method and document classification method and device for classifying a classification model that combines RPA and AI to realize IA, electronic equipment, vehicles, computer-readable storage media, computer program products, and Computer program.
  • Robotic Process Automation uses specific "robot software” to simulate human operations on a computer and automatically execute process tasks according to rules.
  • AI Artificial Intelligence
  • Intelligent Automation is a general term for a series of technologies from robotic process automation to artificial intelligence. It combines RPA with Optical Character Recognition (OCR), Intelligent Character Recognition (ICR), and process mining. (Process Mining), Deep Learning (DL), Machine Learning (ML), Natural Language Processing (NLP), Speech Recognition (Automatic Speech Recognition, ASR), Speech Synthesis (Text To Speech) , TTS), Computer Vision (CV) and other AI technologies are combined to create end-to-end business processes that can think, learn and adapt, covering from process discovery, process automation, to automatic and continuous The entire process of data collection, understanding the meaning of data, and using data to manage and optimize business processes.
  • a complex business process may involve processing several types of documents, and different types of documents need to call different information extraction models to extract information, and then perform follow-up operations based on the extracted key information.
  • Business processing such as information entry, bill reimbursement, etc.
  • the email attachment may contain documents such as contracts and invoices.
  • the RPA robot needs to call the contract extraction model to extract information from the contract documents and call the general
  • the multi-bill model is used to extract information from invoice documents, and then perform subsequent processing based on the extracted information.
  • Embodiments of the present disclosure provide a training method and document classification method and device, electronic equipment, vehicles, computer readable storage media, computer program products and computer programs that combine RPA and AI to implement IA, to solve problems in related technologies.
  • Model training methods for document classification have technical problems such as the need to use a large amount of training data for training, and the training time of the classification model is long.
  • An embodiment of the first aspect of the present disclosure provides a training method for a classification model that combines RPA and AI to implement IA.
  • the method includes: obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining each sample.
  • the category of the page input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain each The encoding vector corresponding to the sample page; and the encoding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain the target classification model for document classification.
  • the coding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain a target classification model for document classification, including: dividing the training data into a training set and a verification set.
  • the training set includes encoding vectors corresponding to multiple first pages.
  • the verification set includes encoding vectors corresponding to multiple second pages.
  • Each first page and each second page are labeled according to their category; based on each first page According to the corresponding coding vector and its category, the initial classification model is trained for multiple rounds to obtain the candidate classification model after each round of training; and based on the coding vector corresponding to each second page and its category, the candidate classification model after each round of training is obtained.
  • selecting a target classification model for document classification from the candidate classification models after each round of training includes: for the candidate classification models after each round of training The model inputs the coding vector corresponding to each second page into the candidate classification model to obtain the confidence that each second page predicted by the candidate classification model belongs to multiple preset categories, and based on the prediction that each second page belongs to multiple preset categories The confidence of the category and the category of each second page are used to determine the loss value corresponding to the candidate classification model; and the loss value corresponding to the candidate classification model after each round of training is selected from the candidate classification model after each round of training.
  • Target classification model for document classification includes: for the candidate classification models after each round of training The model inputs the coding vector corresponding to each second page into the candidate classification model to obtain the confidence that each second page predicted by the candidate classification model belongs to multiple preset categories, and based on the prediction that each second page belongs to multiple preset categories The confidence of the category and the category of each second page are used to determine the loss value corresponding to the candidate classification model; and the loss value corresponding to the
  • obtaining the position coordinates of at least one word included in each of the multiple sample pages includes: obtaining multiple sample documents sent by the RPA robot; for each sample page, obtaining optical character recognition of the sample page OCR recognition information; based on the OCR recognition information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; Obtain the position coordinates of the area occupied by each text fragment; and obtain the position coordinates of each word based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.
  • the training method of the classification model that combines RPA and AI to implement IA obtains the position coordinates of at least one word included in each sample page in multiple sample pages, and obtains the category to which each sample page belongs;
  • the position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained;
  • the encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification.
  • the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
  • An embodiment of the second aspect of the present disclosure provides a document classification method that combines RPA and AI to implement IA.
  • the method includes: obtaining a target document sent by an RPA robot, where the target document includes at least one target page; and obtaining at least one word included in each target page.
  • the position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page; based on the encoding vector corresponding to each word in each target page, each target is obtained
  • the coding vector corresponding to the page input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect Obtain; and for each preset category, determine the average value of the confidence that each target page belongs to the preset category; based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category.
  • the document classification method that combines RPA and AI to implement IA obtains the target document sent by the RPA robot.
  • the target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained.
  • the encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.
  • the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category.
  • a third embodiment of the present disclosure provides a training device that combines RPA and AI to implement an IA classification model, including: a first acquisition module for acquiring the position coordinates of at least one word included in each sample page in multiple sample pages. , and obtain the category to which each sample page belongs; the first processing module is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; the second The acquisition module is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page; and the training module, It is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification.
  • the training module includes: a dividing unit, used to divide the training data into a training set and a verification set.
  • the training set includes a plurality of encoding vectors corresponding to the first page
  • the verification set includes a plurality of encoding vectors corresponding to the second page. Coding vectors, each first page and each second page are annotated with their categories; the training unit is used to conduct multiple rounds of training on the initial classification model based on the coding vectors corresponding to each first page and their categories, so as to obtain each round Candidate classification models after training; and a selection unit for selecting a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.
  • the selection unit includes: a processing subunit, configured to input the coding vector corresponding to each second page into the candidate classification model for each round of training to obtain the candidate classification model predicted by the candidate classification model.
  • the first acquisition module includes: a first acquisition subunit, used to acquire multiple sample documents sent by the RPA robot; a second acquisition subunit, used for each sample page, to acquire the optical image of the sample page. Character recognition OCR identification information; the third acquisition sub-unit is used to obtain the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page; the word segmentation unit is used to segment the text content of each text fragment , obtain at least one word included in each text segment; the fourth acquisition subunit is used to obtain the position coordinates of the area occupied by each text segment; and the fifth acquisition subunit is used based on the position coordinates of the area occupied by each text segment, and the position of each word in the corresponding text fragment, and obtain the position coordinates of each word.
  • the training device that combines RPA and AI to implement an IA classification model obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs;
  • the position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained;
  • the encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification.
  • the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
  • An embodiment of the fourth aspect of the present disclosure provides a document classification device that combines RPA and AI to implement IA.
  • the device includes: a third acquisition module for acquiring the target document sent by the RPA robot, where the target document includes at least one target page; fourth The acquisition module is used to obtain the position coordinates of at least one word included in each target page; the second processing module is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the position coordinates of each word in each target page.
  • the fifth acquisition module is used to obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page;
  • the third processing module is used to input the encoding vector corresponding to each target page into the target Classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect; the first determination module is used for each preset category, Determine the average value of the confidence that each target page belongs to the preset category; and the second determination module is used to determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.
  • the document classification device that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the target document sent by the RPA robot.
  • the target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. The encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.
  • the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category.
  • the embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the above-mentioned first step of the present disclosure is implemented.
  • the sixth embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the method described in the first embodiment of the present disclosure is implemented, or the computer program is implemented as The method described in the embodiment of the second aspect of the present disclosure.
  • the seventh embodiment of the present disclosure proposes a computer program product, including a computer program.
  • the computer program When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method described above in the present disclosure.
  • the method described in the embodiment of the second aspect The seventh embodiment of the present disclosure proposes a computer program product, including a computer program.
  • the eighth embodiment of the present disclosure provides a computer program, including computer program code.
  • the computer program code When the computer program code is run on a computer, the computer performs the method as described in the first embodiment of the present disclosure, or The method is performed as described in the embodiment of the second aspect of the present disclosure of the present application.
  • Figure 1 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of a document classification method for implementing IA by combining RPA and AI according to the third embodiment of the present disclosure
  • Figure 4 is a schematic structural diagram of a training device that combines RPA and AI to implement an IA classification model according to the fourth embodiment of the present disclosure
  • Figure 5 is a schematic structural diagram of a document classification device that combines RPA and AI to implement IA according to the fifth embodiment of the present disclosure
  • FIG. 6 is a block diagram of an electronic device used to implement a classification model training method that combines RPA and AI to implement IA or a document classification method that combines RPA and AI to implement IA according to an embodiment of the present disclosure.
  • pre-trained document understanding models such as the LayoutLM model are usually used to understand the documents, and then a classification model is used to classify the documents based on the understanding results.
  • pre-trained document understanding models and classification models are usually jointly trained based on training data related to business scenarios.
  • the structure of the pre-trained document understanding model is complex and requires more training data to achieve fine-tuning training of the pre-trained document understanding model. The entire training process takes a long time.
  • Embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA.
  • a classification model for document classification can be obtained without fine-tuning the pre-trained document understanding model.
  • the method includes: obtaining the position coordinates of at least one word included in each sample page in the plurality of sample pages, and obtaining the category to which each sample page belongs; inputting the position coordinates of each word in each sample page into the pre-training document understanding model to obtain the coding vector corresponding to each word in each sample page; based on the coding vector corresponding to each word in each sample page, obtain the coding vector corresponding to each sample page; use the coding vector corresponding to each sample page and its category as training Data is used to train the initial classification model to obtain the target classification model for document classification.
  • the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
  • RPA robot refers to a software robot that can combine AI technology and RPA technology to automatically perform business processing.
  • RPA robots have two characteristics: “connector” and “non-intrusion”. By simulating human operation methods, they can extract, integrate and connect data from different systems in a non-intrusive way without changing the information system.
  • "document” is an electronic document, which can be a document in PDF (Portable Document Format, Portable Document Format) format obtained by scanning a paper document, or it can also be a document stored on a computer, mobile phone, etc.
  • the embodiment of the present disclosure does not limit the documents edited and formed in the smart device.
  • "Target document” is the document to be classified.
  • "Page” is a page included in the document.
  • an electronic contract document can have one or more pages.
  • Sample pages are the pages included in the sample documents used for model training. The "first page” refers to each page included in the training set after the training data is divided into a training set and a verification set.
  • “Second page” refers to each page included in the verification set after dividing the training data into a training set and a verification set.
  • “Target page” is a page included in the target document to be classified.
  • a "text fragment” is a fragment composed of part of the content on the page.
  • the text fragment may be one line or less than one line of text arranged horizontally, or it may be a column or text arranged vertically. For text that is less than one column, the embodiment of the present disclosure does not limit this.
  • the "encoding vector corresponding to a word” is a vector used to represent the feature information of the word, where the feature information of the word includes, for example, the position of the word on the page.
  • "Encoding vector corresponding to the sample page” is a vector used to characterize the characteristic information of the sample page.
  • the characteristic information of the sample page includes, for example, the positions of all words included in the sample page on the page.
  • "Encoding vector corresponding to the target page” is a vector used to characterize the characteristic information of the target page, where the characteristic information of the target page includes, for example, the positions of all words included in the target page in the page, etc.
  • pre-trained document understanding model refers to a pre-trained model for understanding documents, such as the LayoutLM model (a pre-trained model that processes multi-modal information (text and layout information)) , LayoutLM2.0 model, etc.
  • LayoutLM model a pre-trained model that processes multi-modal information (text and layout information)
  • LayoutLM2.0 model etc.
  • the embodiments of the present disclosure do not limit this, as long as it can be used to encode the pages in the document, and obtain the encoding vector corresponding to each word in the page.
  • the "preset category” refers to the category to which documents created in advance may belong according to needs. For example, it can be set to a bill category, a contract category, etc.
  • “Category of the target document” refers to the category of the target document to be obtained by predicting the category of the target document to be classified using the trained target classification model.
  • “Category of the sample page” refers to the category to which the sample page actually belongs, such as bill category, contract category, etc.
  • the "classification model” is an AI neural network model used for document classification, and its structure can be set as needed.
  • the input of the classification model is the coding vector corresponding to the page in the document, and the output of the classification model is the predicted category of the corresponding page, which may be the confidence that the page belongs to one or more preset categories.
  • confidence can indicate the likelihood that a certain page belongs to a certain preset category.
  • the confidence level that the target page belongs to the preset category A indicates the probability that the target page belongs to the preset category A.
  • the average value of the confidence that each target page belongs to the preset category is the value obtained by averaging the confidence that each target page belongs to the preset category.
  • the "document processing platform” is an intelligent automation platform for intelligently processing documents.
  • intelligent document processing is one of the core capabilities of the intelligent automation platform.
  • Intelligent Document Processing is based on Optical Character Recognition (OCR), Computer Vision (CV), Natural Language Processing (NLP), Knowledge Graph (KG) ) and other AI technologies, it can identify, classify, extract elements, verify, compare, and correct errors of various types of documents, and is a new generation of automation technology that helps enterprises realize the intelligence and automation of document processing.
  • OCR Optical Character Recognition
  • the following describes a training method for a classification model that combines RPA and AI to implement IA, a document classification method and device that combines RPA and AI to implement IA, electronic equipment, vehicles, and computer-readable storage media according to embodiments of the disclosure/disclosure, Computer program products and computer programs.
  • Figure 1 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure. As shown in Figure 1, the method may include steps 101 to 104.
  • Step 101 Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
  • the training method of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a training device that combines RPA and AI to implement the classification model of IA.
  • the training device that combines RPA and AI to implement the IA classification model will be referred to as the training device.
  • the training device may be implemented by software and/or hardware, and the training device may be an electronic device, or may be configured in an electronic device to implement training of a classification model for document classification.
  • the electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
  • the words included in the sample page are words (that is, tokens) obtained by segmenting the text fragments in the sample page.
  • text fragments can be segmented based on preset vocabulary lists and rules. For example, for Chinese, it can be segmented word by word. For example, for the text segment "1 23”, the words obtained by segmentation are “1" and "23". For the text segment "Zhang San's”, the words obtained by segmentation are "Zhang San” " and "of”; for English, it can be divided into sub-words (sub-words) of stems and affixes. For example, for the text fragment "working”, the words obtained by segmentation are "work” and "ing".
  • the position coordinates of the word are used to represent the position of the word in the page (referring to the sample page in this embodiment).
  • the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the sample page as the origin.
  • the sample page For each sample page, if the sample page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent model training. For example, assuming that the number of words is preset to 128, for each sample page, the position coordinates of up to 128 words can be obtained.
  • Step 102 Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
  • the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, so that the training device can obtain the corresponding word encoding vector.
  • Step 103 Obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.
  • the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the sample page; the sample page includes multiple word
  • the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the sample page.
  • Step 104 Use the coding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.
  • an initial classification model may be pre-constructed.
  • the initial classification model can be an L*M-dimensional matrix.
  • L is an integer greater than 1
  • M is an integer greater than 0.
  • the encoding vector corresponding to each sample page can be used as the input of the classification model, and the category of each sample page can be used as the label, and the initial classification model can be supervised and trained to obtain the target classification model.
  • the target classification model can be used to classify documents. Therefore, in actual business scenarios, you can first use the target classification model to classify documents, and then call the information extraction model of the corresponding category to extract information from the documents, and then implement further business processing based on the extracted information.
  • the pre-trained document understanding model is used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the sample page, and the pre-trained document understanding model is not performed during the training process. Training, only the classification model is trained. Because the classification model does not need to be used to generate encoding vectors corresponding to each sample page, and has a simple structure, it only needs to use a small amount of training data for training to obtain a classification model that can accurately classify documents, and the training process is short. As a result, the training speed of the classification model can be increased and the amount of data required during the training process can be reduced without affecting the accuracy of document classification. In addition, the classification model has a simple structure and takes up little space, making it easy to deploy.
  • embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs; Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain the encoding corresponding to each sample page Vector; use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification.
  • the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
  • the training method of the classification model that combines RPA and AI to implement IA provided by the embodiment of the present disclosure will be further described below with reference to FIG. 2 .
  • Figure 2 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure. As shown in Figure 2, the method includes steps 201 to 206.
  • Step 201 Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
  • the sample page For each sample page, if the sample page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent model training. For example, assuming that the number of words is preset to 128, for each sample page, the position coordinates of up to 128 words can be obtained.
  • the position coordinates of at least one word included in each sample page in the plurality of sample pages can be obtained in the following manner, that is, step 201 can include: for each sample page, obtain the optical character recognition OCR of the sample page. Identification information; based on the OCR identification information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; obtain the proportion of each text fragment The position coordinates of the area; based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment, the position coordinates of each word are obtained.
  • the area occupied by each text fragment is usually a rectangle.
  • the position coordinates of the area occupied by the text fragment may include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the text fragment, or the position coordinates of the upper right corner vertex and the lower left corner vertex.
  • the position coordinates of a word may also include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the word, or the position coordinates of the upper right corner vertex and the lower left corner vertex.
  • OCR recognition technology can be used to identify it in advance to obtain the OCR identification information of the sample page, and then obtain the text content of at least one text fragment from the OCR identification information of the sample page, Then, based on the preset vocabulary list and rules, the text content of each text segment is segmented to obtain at least one word included in each text segment.
  • OCR recognition technology can be used to identify it in advance to obtain the OCR identification information of the sample page, and then obtain the text content of at least one text fragment from the OCR identification information of the sample page, Then, based on the preset vocabulary list and rules, the text content of each text segment is segmented to obtain at least one word included in each text segment.
  • it can be segmented word by word; for English, it can be segmented into sub-words (sub-words) of stems and affixes.
  • the width and height of the sample page, as well as the position of each text fragment in the sample page can be obtained, and then based on the width and height of the sample page, and the position of each text fragment in the sample page, determine The position coordinates of the area occupied by each text fragment, such as the position coordinates of the upper left corner vertex and the lower right corner vertex (or the position coordinates of the upper right corner vertex and the lower left corner vertex).
  • the position coordinates of each word can be obtained based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.
  • the position coordinates of the area occupied by the text fragment include the x-axis coordinates and y-axis coordinates (x1, y1) of the upper left corner vertex, and the x-axis coordinates of the lower right corner vertex and y-axis coordinate (x2,y2).
  • the position coordinates of the word include the x-axis coordinate and y-axis coordinate (x3, y3) of the upper-left corner vertex of the area occupied by the word, and the x-axis coordinate and y-axis coordinate (x4, y4) of the lower-right corner vertex.
  • the text fragment may be one line or less than one line of text arranged horizontally, or one or less than one column of text arranged vertically
  • A can be set as needed, for example, set to 1.5.
  • the width of each word can be obtained based on the length ratio of each word in the text fragment and the value of x2-x1. Then, for the first word on the left in the text fragment, x1 is used as the x-axis coordinate x3 of the upper-left corner vertex of the word, and x3 + the width of the first word is used as the x-axis coordinate x4 of the lower-right corner vertex of the first word. .
  • y1 can be regarded as the y-axis coordinate y3 of the upper-left corner vertex of each word
  • y2 can be regarded as the y-axis coordinate y4 of the lower-right corner vertex of each word.
  • the height of each word can be obtained based on the length ratio of each word in the text fragment and the value of y2-y1. Then, for the first word at the top of the text fragment, use y1 as the y-axis coordinate y3 of the upper-left corner vertex of the word, and use y3 + the height of the first word as the y-axis coordinate y4 of the lower-right corner vertex of the first word. . For other words in the text fragment, you can add y1 to the cumulative height of all words above the word to use it as the y-axis coordinate y3 of the upper left corner vertex of the word, and use y3 + the height of the word as the lower right corner vertex of the word.
  • x1 can be regarded as the x-axis coordinate x3 of the upper-left corner vertex of each word
  • x2 can be regarded as the x-axis coordinate x4 of the lower-right corner vertex of each word.
  • the words obtained by segmenting the text content of the text fragment are “1” and "23".
  • the multiple sample documents obtained by the training device may be sent by the RPA robot. That is, for each sample page, before obtaining the optical character recognition OCR identification information of the sample page, it may also include: obtaining multiple sample documents sent by the RPA robot.
  • the training device can be configured in the document processing platform, and the document processing platform can provide an upload interface, so that when the user needs to train and generate a target classification model, each sample document can be uploaded through the upload interface based on the RPA robot, so that the document processing platform
  • the training device in can obtain multiple sample documents uploaded by the RPA robot.
  • the training device can automatically obtain the sample pages in combination with the RPA robot, thereby reducing the labor cost of training the classification model.
  • Step 202 Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
  • Step 203 Based on the encoding vector corresponding to each word in each sample page, obtain the encoding vector corresponding to each sample page.
  • Step 204 Divide the training data into a training set and a verification set.
  • the training set includes encoding vectors corresponding to multiple first pages
  • the verification set includes encoding vectors corresponding to multiple second pages.
  • Each first page and each second page adopt Mark the category it belongs to.
  • the number ratio of the first page included in the training set to the second page included in the verification set can be set arbitrarily as needed, for example, 4:1, and the embodiment of the present disclosure does not limit this.
  • Step 205 Perform multiple rounds of training on the initial classification model based on the encoding vector corresponding to each first page and its category to obtain a candidate classification model after each round of training.
  • the number of rounds for training the initial classification model can be set arbitrarily as needed, and the embodiments of the present disclosure do not limit this.
  • the training set can be divided into N sub-training sets, and based on the first sub-training set, the initial classification model is After a round of iterative training, the candidate classification model after this round of training is obtained, and then based on the next sub-training set, a round of iterative training is performed on the candidate classification model after the first round of training, and the candidate classification model after this round of training is obtained, and then based on the next sub-training set In the next sub-training set, perform one round of iterative training on the candidate classification model after the second round of training to obtain the candidate classification model after this round of training, and so on, so as to obtain the candidates after N rounds of training based on N sub-training sets.
  • Classification model is
  • a round of iterative training can also be performed on the initial classification model based on the training set to obtain a candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the first round of training to obtain the candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the second round of training. After one round of iterative training, the candidate classification model after this round of training is obtained, and so on, based on the training set, the candidate classification model after N rounds of training is obtained.
  • Step 206 Based on the coding vector corresponding to each second page and its category, select a target classification model for document classification from the candidate classification models after each round of training.
  • step 206 can be implemented in the following manner: for the candidate classification model after each round of training, the encoding vector corresponding to each second page is input into the candidate classification model to obtain each prediction of the candidate classification model.
  • the confidence that each second page predicted by a certain candidate classification model belongs to multiple preset categories and the category to which each second page belongs can be substituted into the cross-entropy loss function to determine the candidate classification model. the corresponding loss value.
  • the cross-entropy loss function can be as shown in formula (1).
  • L ce represents the loss value.
  • N represents the number of second pages included in the validation set.
  • C represents the number of preset categories, also called the number of categories.
  • the candidate classification model with the lowest corresponding loss value among the candidate classification models after each round of training may be determined as the target classification model. Therefore, among the candidate classification models after each round of training, the model with the highest prediction accuracy can be determined as the target classification model.
  • embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs.
  • Based on the encoding vector corresponding to each word in each sample page obtain the encoding corresponding to each sample page.
  • the training data is divided into a training set and a verification set.
  • the training set includes multiple encoding vectors corresponding to the first page
  • the verification set includes multiple encoding vectors corresponding to the second page.
  • Each first page and each second page use their own Categories are marked, and based on the coding vectors corresponding to each first page and the categories they belong to, the initial classification model is trained for multiple rounds to obtain candidate classification models after each round of training, based on the coding vectors corresponding to each second page and the categories they belong to. , select the target classification model for document classification from the candidate classification models after each round of training.
  • the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
  • embodiments of the present disclosure also provide a document classification method that combines RPA and AI to implement IA.
  • the document classification method for implementing IA by combining RPA and AI provided by the embodiment of the present disclosure will be described below with reference to FIG. 3 .
  • Figure 3 is a flow chart of a document classification method that combines RPA and AI to implement IA according to the third embodiment of the present disclosure. As shown in Figure 3, the method includes steps 301 to 307.
  • Step 301 Obtain the target document sent by the RPA robot.
  • the target document includes at least one target page.
  • the document classification method that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a document classification device that combines RPA and AI to implement IA.
  • the document classification device that combines RPA and AI to implement IA can be implemented by software and/or hardware.
  • the document classification device that combines RPA and AI to implement IA can be an electronic device. equipment, or can also be configured in electronic equipment to classify documents.
  • the electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
  • a document classification device that combines RPA and AI to implement IA can be configured in a document processing platform, and the document processing platform can provide an upload interface. Therefore, when the user needs to classify a certain target document, the target document can be uploaded through the upload interface based on the RPA robot, so that the document classification device in the document processing platform that combines RPA and AI to implement IA can obtain the target document uploaded by the RPA robot. .
  • Step 302 Obtain the position coordinates of at least one word included in each target page.
  • the words included in the target page are words (that is, tokens) obtained by segmenting the text fragments in the target page.
  • text fragments can be segmented based on preset vocabulary lists and rules.
  • the position coordinates of the word are used to represent the position of the word in the page (in this embodiment, the target page).
  • the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the target page as the origin.
  • the position coordinates of a limited number of words can be obtained for subsequent document classification. For example, assuming that the number of words is preset to 128, for each target page, the position coordinates of up to 128 words can be obtained.
  • the method of obtaining the position coordinates of at least one word included in each target page may refer to the method of obtaining the position coordinates of at least one word included in the sample page in the above embodiment, which will not be described again here.
  • Step 303 Input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.
  • the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, thereby realizing IA by combining RPA and AI.
  • the document classification device can obtain the encoding vector corresponding to the word.
  • Step 304 Obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.
  • the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the target page; the target page includes multiple In the case of words, the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the target page.
  • Step 305 Input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.
  • the target classification model is trained through the training method of the classification model that combines RPA and AI to implement IA as shown in any of the above embodiments.
  • the confidence that the target page belongs to multiple preset categories can indicate the probability that the target page belongs to each preset category.
  • the trained target classification model can be deployed in a document processing platform to implement IA by combining RPA and AI.
  • the document classification device can input the encoding vector corresponding to each target page into the target classification model deployed in the document processing platform to obtain the confidence that each target page belongs to multiple preset categories.
  • Step 306 For each preset category, determine the average value of the confidence that each target page belongs to the preset category.
  • the confidence scores of each target page belonging to the same preset category may be summed and then averaged, thereby obtaining an average value of the confidence scores of each target page belonging to the same preset category.
  • Step 307 Determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.
  • the preset category with the largest corresponding average value among each preset category may be determined as the category to which the target document belongs.
  • the default categories include category 1 and category 2.
  • the target document includes 10 target pages.
  • the confidence that each of the 10 target pages belongs to category 1 and the confidence that each of the 10 target pages belongs to category 2 can be obtained.
  • the average confidence level of 10 target pages belonging to category 1 is greater than the average confidence level of 10 target pages belonging to category 2
  • it can be determined that the category to which the target document belongs is category 1.
  • the pre-trained document understanding model in the embodiment of the present disclosure is universal for various business scenarios and does not require training. It can be used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the target page.
  • the target classification model does not need to be used to generate encoding vectors corresponding to each target page. It has a simple structure and only needs to use a small amount of training data for training. The training process is short and the prediction effect of the trained target classification model is not good. Affected, accurate classification of documents can still be achieved.
  • the document classification method that combines RPA and AI to implement IA obtains the target document sent by the RPA robot.
  • the target document includes at least one target page, and obtains the position coordinates of at least one word included in each target page.
  • Based on the encoding vector corresponding to each word in each target page obtain the encoding corresponding to each target page.
  • Vector input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, determine the average confidence level that each target page belongs to the preset category.
  • embodiments of the present disclosure also propose a training device that combines RPA and AI to implement an IA classification model.
  • Figure 4 is a schematic structural diagram of a training device for implementing an IA classification model by combining RPA and AI according to the fourth embodiment of the present disclosure.
  • the training device 400 that combines RPA and AI to implement an IA classification model includes: a first acquisition module 401, a first processing module 402, a second acquisition module 403, and a training module 404.
  • the first obtaining module 401 is used to obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
  • the first processing module 402 is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
  • the second acquisition module 403 is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.
  • the training module 404 is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.
  • the training device 400 of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can perform the training method of the classification model that combines RPA and AI to implement IA provided in the above embodiments.
  • the training device 400 that combines RPA and AI to implement the classification model of IA can be implemented by software and/or hardware.
  • the training device that combines RPA and AI to implement the classification model of IA can be an electronic device, or can also be configured in an electronic device. , to implement the training of classification models for document classification.
  • the electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
  • the training module 404 includes a dividing unit, a training unit and a selection unit.
  • a dividing unit used to divide the training data into a training set and a verification set.
  • the training set includes encoding vectors corresponding to multiple first pages
  • the verification set includes encoding vectors corresponding to multiple second pages.
  • Each first page and each second page Pages are labeled with their categories.
  • the training unit is used to perform multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and its category, so as to obtain a candidate classification model after each round of training.
  • the selection unit is used to select a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.
  • the selection unit includes a processing subunit and a selection subunit.
  • the processing subunit is used to input the encoding vector corresponding to each second page into the candidate classification model after each round of training, so as to obtain the prediction that each second page belongs to multiple preset categories predicted by the candidate classification model. Confidence, and based on the confidence that each second page belongs to multiple preset categories and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model.
  • the first acquisition module 401 includes a first acquisition sub-unit, a second acquisition sub-unit, a third acquisition sub-unit, a word segmentation unit, a fourth acquisition sub-unit and a fifth acquisition sub-unit.
  • the first acquisition subunit is used to acquire multiple sample documents sent by the RPA robot.
  • the second acquisition subunit is used to acquire the optical character recognition (OCR) identification information of the sample page for each sample page.
  • OCR optical character recognition
  • the third acquisition subunit is used to acquire the text content of at least one text fragment in the sample page based on the OCR recognition information of the sample page.
  • the word segmentation unit is used to segment the text content of each text segment to obtain at least one word included in each text segment.
  • the fourth acquisition subunit is used to acquire the position coordinates of the area occupied by each text fragment.
  • the fifth acquisition subunit is used to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.
  • the training device for the classification model of IA implemented by combining RPA and AI in the embodiment of the present disclosure obtains the position coordinates of at least one word included in each sample page of multiple sample pages, and obtains the category to which each sample page belongs; the position coordinates of each word in each sample page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; the encoding vector corresponding to each sample page and the category to which it belongs are used as training data to train the initial classification model to obtain the target classification model for document classification.
  • the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the amount of data required in the training process is reduced.
  • embodiments of the present disclosure also propose a document classification device that combines RPA and AI to implement IA.
  • Figure 5 is a schematic structural diagram of a document classification device that implements IA by combining RPA and AI according to the fifth embodiment of the present disclosure.
  • the document classification device 500 that combines RPA and AI to implement IA includes: a third acquisition module 501, a fourth acquisition module 502, a second processing module 503, a fifth acquisition module 504, and a third processing module 505 , the first determination module 506 and the second determination module 507.
  • the third acquisition module 501 is used to acquire the target document sent by the RPA robot, where the target document includes at least one target page.
  • the fourth obtaining module 502 is used to obtain the position coordinates of at least one word included in each target page.
  • the second processing module 503 is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.
  • the fifth acquisition module 504 is used to acquire the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.
  • the third processing module 505 is used to input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model uses the method described in the embodiment of the first aspect. method training.
  • the first determination module 506 is configured to determine, for each preset category, the average value of the confidence that each target page belongs to the preset category.
  • the second determination module 507 is configured to determine the category to which the target document belongs from among the preset categories based on the average values corresponding to the preset categories.
  • the document classification device that combines RPA and AI to implement IA in the embodiment of the present disclosure obtains the target document sent by the RPA robot, the target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and The position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. , input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.
  • each preset category determines the average value of the confidence that each target page belongs to the preset category. , based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category.
  • embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, The training method of a classification model that combines RPA and AI to implement IA as described in any of the foregoing method embodiments, or the document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.
  • embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the combination of RPA and AI as described in any of the foregoing method embodiments is implemented.
  • embodiments of the present disclosure also provide a computer program product.
  • the instruction processor in the computer program product When executed, the implementation of IA by combining RPA and AI as described in any of the foregoing method embodiments is realized.
  • embodiments of the present disclosure also provide a computer program, including computer program code.
  • the computer program code When the computer program code is run on a computer, so that the computer executes the method combined with RPA as described in any of the foregoing method embodiments. and AI to implement a classification model training method for IA, or perform a document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.
  • FIG. 6 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.
  • the electronic device 10 shown in FIG. 6 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
  • electronic device 10 is embodied in the form of a general computing device.
  • the components of electronic device 10 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components, including memory 28 and processing unit 16.
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnection
  • electronic device 10 includes a variety of computer system readable media. These media may be any available media that can be accessed by electronic device 10, including volatile and nonvolatile media, removable and non-removable media.
  • the memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32.
  • Electronic device 10 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as a "hard drive").
  • a disk drive may be provided for reading and writing to a removable non-volatile disk (e.g., a "floppy disk"), and a disk drive for reading and writing a removable non-volatile optical disk (e.g., a compact disk read-only memory).
  • CD-ROM Disc Read Only Memory
  • DVD-ROM Digital Video Disc Read Only Memory
  • each drive may be connected to bus 18 through one or more data media interfaces.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.
  • a program/utility 40 having a set of (at least one) program modules 42 may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment.
  • Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.
  • Electronic device 10 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 10, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 10 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22.
  • the electronic device 10 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)
  • a public network such as the Internet
  • network adapter 20 communicates with other modules of electronic device 10 via bus 18 .
  • bus 18 It should be understood that, although not shown in Figure 6, other hardware and/or software modules may be used in conjunction with electronic device 10, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.
  • the processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.
  • references to the terms “one embodiment,” “some embodiments,” “an example,” “specific examples,” or “some examples” or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Non-exhaustive list of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.
  • various parts of the embodiments of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
  • various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • the storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to a training method and apparatus for implementing an IA classification model using RPA and AI. The training method comprises: obtaining position coordinates of at least one word comprised in each sample page and a category to which each sample page belongs; inputting the position coordinates of each word into a pre-trained document understanding model to obtain a corresponding encoding vector; on the basis of the encoding vector corresponding to each word in each sample page, obtaining an encoding vector corresponding to each sample page; and taking the encoding vector corresponding to each sample page and the category to which each sample page belongs as training data, and training an initial classification model to obtain a target classification model for document classification. The speed of training the classification model is increased, and the amount of data required in a training process is reduced. The present application further provides a method for implementing IA document classification using RPA and AI, in which a target document sent by a RPA robot undergoes IA classification by using the target classification model. The labor cost required by document classification is reduced, and the document classification efficiency is improved.

Description

结合RPA和AI实现IA的分类模型的训练方法及装置Training methods and devices for classification models that combine RPA and AI to implement IA
相关申请的交叉引用Cross-references to related applications
本公开要求在2022年09月16日在中国提交的中国专利申请号2022111259565的优先权,其全部内容通过引用并入本文。This disclosure claims priority from Chinese Patent Application No. 2022111259565 filed in China on September 16, 2022, the entire content of which is incorporated herein by reference.
技术领域Technical field
本公开涉及机器人流程自动化及人工智能技术领域,特别涉及一种结合RPA和AI实现IA的分类模型的训练方法和文档分类方法及装置、电子设备、车辆、计算机可读存储介质、计算机程序产品和计算机程序。The present disclosure relates to the technical fields of robotic process automation and artificial intelligence, and in particular to a training method and document classification method and device for classifying a classification model that combines RPA and AI to realize IA, electronic equipment, vehicles, computer-readable storage media, computer program products, and Computer program.
背景技术Background technique
机器人流程自动化(Robotic Process Automation,简称RPA),是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。Robotic Process Automation (RPA) uses specific "robot software" to simulate human operations on a computer and automatically execute process tasks according to rules.
人工智能(Artificial Intelligence,简称AI)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。Artificial Intelligence (AI for short) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
智能自动化(Intelligent Automation,简称IA)是一系列从机器人流程自动化到人工智能的技术总称,将RPA与光学字符识别(Optical Character Recognition,OCR)、智能字符识别(Intelligent Character Recognition,ICR)、流程挖掘(Process Mining)、深度学习(Deep Learning,DL)、机器学习(Machine Learning,ML)、自然语言处理(Natural Language Processing,NLP)、语音识别(Automatic Speech Recognition,ASR)、语音合成(Text To Speech,TTS)、计算机视觉(Computer Vision,CV)等多种AI技术相结合,以创建能够思考、学习及自适应的端到端的业务流程,涵盖从流程发现、流程自动化,到通过自动而持续的数据收集、理解数据的含义,使用数据来管理和优化业务流程的整个历程。Intelligent Automation (IA) is a general term for a series of technologies from robotic process automation to artificial intelligence. It combines RPA with Optical Character Recognition (OCR), Intelligent Character Recognition (ICR), and process mining. (Process Mining), Deep Learning (DL), Machine Learning (ML), Natural Language Processing (NLP), Speech Recognition (Automatic Speech Recognition, ASR), Speech Synthesis (Text To Speech) , TTS), Computer Vision (CV) and other AI technologies are combined to create end-to-end business processes that can think, learn and adapt, covering from process discovery, process automation, to automatic and continuous The entire process of data collection, understanding the meaning of data, and using data to manage and optimize business processes.
在智能文档处理的业务场景中,一个复杂的业务流程可能涉及到处理若干种类别的文档,而不同类别的文档需要调用不同的信息抽取模型来进行信息抽取,进而基于抽取出的关键信息进行后续的业务处理,比如信息录入、票据报销等。比如,采用RPA机器人自动处理客户A向供应商B发送的订购产品的邮件时,邮件附件中可能会包含合同、发票等文档,RPA机器人需要调用合同抽取模型来对合同文档进行信息抽取,调用通用多票据模型来对发票文档进行信息抽取,进而基于抽取出的信息进行后续处理。这就需要先采用分类模型对文档进行分类,再调用对应类别的信息抽取模型来对文档进行信息抽取,再实现进 一步的业务处理。而如何在保证分类模型的准确性的前提下,以较少的训练数据,实现对分类模型的快速训练,成为一个亟待解决的问题。In the business scenario of intelligent document processing, a complex business process may involve processing several types of documents, and different types of documents need to call different information extraction models to extract information, and then perform follow-up operations based on the extracted key information. Business processing, such as information entry, bill reimbursement, etc. For example, when an RPA robot is used to automatically process an email from customer A to supplier B for ordering products, the email attachment may contain documents such as contracts and invoices. The RPA robot needs to call the contract extraction model to extract information from the contract documents and call the general The multi-bill model is used to extract information from invoice documents, and then perform subsequent processing based on the extracted information. This requires first using a classification model to classify documents, and then calling the information extraction model of the corresponding category to extract information from the documents, and then implement further One-step business processing. How to quickly train the classification model with less training data while ensuring the accuracy of the classification model has become an urgent problem to be solved.
发明内容Contents of the invention
本公开实施例提供一种结合RPA和AI实现IA的分类模型的训练方法和文档分类方法及装置、电子设备、车辆、计算机可读存储介质、计算机程序产品和计算机程序,以解决相关技术中的用于文档分类的模型训练方法存在的需要使用大量的训练数据进行训练,且分类模型的训练时间长的技术问题。Embodiments of the present disclosure provide a training method and document classification method and device, electronic equipment, vehicles, computer readable storage media, computer program products and computer programs that combine RPA and AI to implement IA, to solve problems in related technologies. Model training methods for document classification have technical problems such as the need to use a large amount of training data for training, and the training time of the classification model is long.
本公开第一方面实施例提供一种结合RPA和AI实现IA的分类模型的训练方法,该方法包括:获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;和将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。An embodiment of the first aspect of the present disclosure provides a training method for a classification model that combines RPA and AI to implement IA. The method includes: obtaining the position coordinates of at least one word included in each sample page in a plurality of sample pages, and obtaining each sample. The category of the page; input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain each The encoding vector corresponding to the sample page; and the encoding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain the target classification model for document classification.
在一些实施例中,将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型,包括:将训练数据划分为训练集和验证集,训练集包括多个第一页面对应的编码向量,验证集包括多个第二页面对应的编码向量,各第一页面以及各第二页面采用所属类别进行标注;基于各第一页面对应的编码向量以及所属类别,对初始的分类模型进行多轮训练,以得到各轮训练后的候选分类模型;和基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型。In some embodiments, the coding vector corresponding to each sample page and its category are used as training data to train the initial classification model to obtain a target classification model for document classification, including: dividing the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages. The verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page are labeled according to their category; based on each first page According to the corresponding coding vector and its category, the initial classification model is trained for multiple rounds to obtain the candidate classification model after each round of training; and based on the coding vector corresponding to each second page and its category, the candidate classification model after each round of training is obtained. In the classification model, select the target classification model used for document classification.
在一些实施例中,基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型,包括:对于每轮训练后的候选分类模型,将各第二页面对应的编码向量,输入候选分类模型中,以得到候选分类模型预测的各第二页面属于多个预设类别的置信度,并基于各第二页面属于多个预设类别的置信度,以及各第二页面的所属类别,确定候选分类模型对应的损失值;和基于各轮训练后的候选分类模型对应的损失值,从各轮训练后的候选分类模型中选取用于文档分类的目标分类模型。In some embodiments, based on the coding vector corresponding to each second page and its category, selecting a target classification model for document classification from the candidate classification models after each round of training includes: for the candidate classification models after each round of training The model inputs the coding vector corresponding to each second page into the candidate classification model to obtain the confidence that each second page predicted by the candidate classification model belongs to multiple preset categories, and based on the prediction that each second page belongs to multiple preset categories The confidence of the category and the category of each second page are used to determine the loss value corresponding to the candidate classification model; and the loss value corresponding to the candidate classification model after each round of training is selected from the candidate classification model after each round of training. Target classification model for document classification.
在一些实施例中,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,包括:获取RPA机器人发送的多个样本文档;对于每个样本页面,获取样本页面的光学字符识别OCR识别信息;基于样本页面的OCR识别信息,获取样本页面中至少一个文本片段的文本内容;对各文本片段的文本内容进行切词,得到各文本片段包括的至少一个词; 获取各文本片段所占区域的位置坐标;和基于各文本片段所占区域的位置坐标,以及其中各词在对应文本片段中的位置,获取各词的位置坐标。In some embodiments, obtaining the position coordinates of at least one word included in each of the multiple sample pages includes: obtaining multiple sample documents sent by the RPA robot; for each sample page, obtaining optical character recognition of the sample page OCR recognition information; based on the OCR recognition information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; Obtain the position coordinates of the area occupied by each text fragment; and obtain the position coordinates of each word based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.
本公开实施例提供的结合RPA和AI实现IA的分类模型的训练方法,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。The training method of the classification model that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the position coordinates of at least one word included in each sample page in multiple sample pages, and obtains the category to which each sample page belongs; The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; The encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
本公开第二方面实施例提供一种结合RPA和AI实现IA的文档分类方法,方法包括:获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面;获取各目标页面包括的至少一个词的位置坐标;将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量;基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量;将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度;其中,目标分类模型通过第一方面实施例所述的方法训练得到;和对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值;基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。An embodiment of the second aspect of the present disclosure provides a document classification method that combines RPA and AI to implement IA. The method includes: obtaining a target document sent by an RPA robot, where the target document includes at least one target page; and obtaining at least one word included in each target page. The position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page; based on the encoding vector corresponding to each word in each target page, each target is obtained The coding vector corresponding to the page; input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect Obtain; and for each preset category, determine the average value of the confidence that each target page belongs to the preset category; based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category.
本公开实施例提供的结合RPA和AI实现IA的文档分类方法,获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面,获取各目标页面包括的至少一个词的位置坐标,将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量,基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量,将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度,对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值,基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。由此,实现了结合使用少量的训练数据快速训练得到的目标分类模型,以及预训练文档理解模型,对目标文档进行准确分类。且通过采用目标分类模型,对RPA机器人发送的目标文档进行IA的分类,减少了文档分类所需的人工成本,提高了文档分类的效率。The document classification method that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. The encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.
本公开第三方面实施例提供一种结合RPA和AI实现IA的分类模型的训练装置,包括:第一获取模块,用于获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;第一处理模块,用于将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;第二获取模块,用于基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;和训练模块, 用于将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。A third embodiment of the present disclosure provides a training device that combines RPA and AI to implement an IA classification model, including: a first acquisition module for acquiring the position coordinates of at least one word included in each sample page in multiple sample pages. , and obtain the category to which each sample page belongs; the first processing module is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; the second The acquisition module is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page; and the training module, It is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification.
在一些实施例中,训练模块,包括:划分单元,用于将训练数据划分为训练集和验证集,训练集包括多个第一页面对应的编码向量,验证集包括多个第二页面对应的编码向量,各第一页面以及各第二页面采用所属类别进行标注;训练单元,用于基于各第一页面对应的编码向量以及所属类别,对初始的分类模型进行多轮训练,以得到各轮训练后的候选分类模型;和选取单元,用于基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型。In some embodiments, the training module includes: a dividing unit, used to divide the training data into a training set and a verification set. The training set includes a plurality of encoding vectors corresponding to the first page, and the verification set includes a plurality of encoding vectors corresponding to the second page. Coding vectors, each first page and each second page are annotated with their categories; the training unit is used to conduct multiple rounds of training on the initial classification model based on the coding vectors corresponding to each first page and their categories, so as to obtain each round Candidate classification models after training; and a selection unit for selecting a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.
在一些实施例中,选取单元,包括:处理子单元,用于对于每轮训练后的候选分类模型,将各第二页面对应的编码向量,输入候选分类模型中,以得到候选分类模型预测的各第二页面属于多个预设类别的置信度,并基于各第二页面属于多个预设类别的置信度,以及各第二页面的所属类别,确定候选分类模型对应的损失值;和选取子单元,用于基于各轮训练后的候选分类模型对应的损失值,从各轮训练后的候选分类模型中选取用于文档分类的目标分类模型。In some embodiments, the selection unit includes: a processing subunit, configured to input the coding vector corresponding to each second page into the candidate classification model for each round of training to obtain the candidate classification model predicted by the candidate classification model. The confidence that each second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model; and select The subunit is used to select the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.
在一些实施例中,第一获取模块,包括:第一获取子单元,用于获取RPA机器人发送的多个样本文档;第二获取子单元,用于对于每个样本页面,获取样本页面的光学字符识别OCR识别信息;第三获取子单元,用于基于样本页面的OCR识别信息,获取样本页面中至少一个文本片段的文本内容;切词单元,用于对各文本片段的文本内容进行切词,得到各文本片段包括的至少一个词;第四获取子单元,用于获取各文本片段所占区域的位置坐标;和第五获取子单元,用于基于各文本片段所占区域的位置坐标,以及其中各词在对应文本片段中的位置,获取各词的位置坐标。In some embodiments, the first acquisition module includes: a first acquisition subunit, used to acquire multiple sample documents sent by the RPA robot; a second acquisition subunit, used for each sample page, to acquire the optical image of the sample page. Character recognition OCR identification information; the third acquisition sub-unit is used to obtain the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page; the word segmentation unit is used to segment the text content of each text fragment , obtain at least one word included in each text segment; the fourth acquisition subunit is used to obtain the position coordinates of the area occupied by each text segment; and the fifth acquisition subunit is used based on the position coordinates of the area occupied by each text segment, and the position of each word in the corresponding text fragment, and obtain the position coordinates of each word.
本公开实施例提供的结合RPA和AI实现IA的分类模型的训练装置,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。The training device that combines RPA and AI to implement an IA classification model provided by embodiments of the present disclosure obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs; The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; The encoding vector corresponding to each sample page and the category it belongs to are used as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
本公开第四方面实施例提供一种结合RPA和AI实现IA的文档分类装置,装置包括:第三获取模块,用于获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面;第四获取模块,用于获取各目标页面包括的至少一个词的位置坐标;第二处理模块,用于将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对 应的编码向量;第五获取模块,用于基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量;第三处理模块,用于将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度;其中,目标分类模型通过第一方面实施例所述的方法训练得到;第一确定模块,用于对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值;和第二确定模块,用于基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。An embodiment of the fourth aspect of the present disclosure provides a document classification device that combines RPA and AI to implement IA. The device includes: a third acquisition module for acquiring the target document sent by the RPA robot, where the target document includes at least one target page; fourth The acquisition module is used to obtain the position coordinates of at least one word included in each target page; the second processing module is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the position coordinates of each word in each target page. word pair The corresponding encoding vector; the fifth acquisition module is used to obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page; the third processing module is used to input the encoding vector corresponding to each target page into the target Classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model is trained by the method described in the embodiment of the first aspect; the first determination module is used for each preset category, Determine the average value of the confidence that each target page belongs to the preset category; and the second determination module is used to determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.
本公开实施例提供的结合RPA和AI实现IA的文档分类装置,获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面,获取各目标页面包括的至少一个词的位置坐标,将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量,基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量,将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度,对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值,基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。由此,实现了结合使用少量的训练数据快速训练得到的目标分类模型,以及预训练文档理解模型,对目标文档进行准确分类。且通过采用目标分类模型,对RPA机器人发送的目标文档进行IA的分类,减少了文档分类所需的人工成本,提高了文档分类的效率。The document classification device that combines RPA and AI to implement IA provided by the embodiment of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and assigns each target to The position coordinates of each word in the page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. The encoding vector corresponding to each target page is input into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, the average value of the confidence that each target page belongs to the preset category is determined based on The average value corresponding to each preset category determines the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.
本公开第五方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,该处理器执行计算机程序时,实现如本公开上述第一方面实施例所述的方法,或者实现如本公开上述第二方面实施例所述的方法。The embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the above-mentioned first step of the present disclosure is implemented. The method described in the embodiment of the first aspect, or implement the method described in the embodiment of the second aspect of the present disclosure.
本公开第六方面实施例提出了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如本公开上述第一方面实施例所述的方法,或者实现如本公开上述第二方面实施例所述的方法。The sixth embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method described in the first embodiment of the present disclosure is implemented, or the computer program is implemented as The method described in the embodiment of the second aspect of the present disclosure.
本公开第七方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如本公开上述第一方面实施例所述的方法,或者实现如本公开上述第二方面实施例所述的方法。The seventh embodiment of the present disclosure proposes a computer program product, including a computer program. When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method described above in the present disclosure. The method described in the embodiment of the second aspect.
本公开第八方面实施例提出了一种计算机程序,包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如本公开上述第一方面实施例所述的方法,或者执行如本申请本公开上述第二方面实施例所述的方法。The eighth embodiment of the present disclosure provides a computer program, including computer program code. When the computer program code is run on a computer, the computer performs the method as described in the first embodiment of the present disclosure, or The method is performed as described in the embodiment of the second aspect of the present disclosure of the present application.
本公开实施例的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of embodiments of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
附图说明 Description of drawings
在附图中,除非另外规定,否则贯穿多个附图相同的附图标记表示相同或相似的部件或元素。这些附图不一定是按照比例绘制的。应该理解,这些附图仅描绘了根据本公开公开的一些实施方式,而不应将其视为是对本公开范围的限制。In the drawings, unless otherwise specified, the same reference numbers refer to the same or similar parts or elements throughout the several figures. The drawings are not necessarily to scale. It should be understood that these drawings depict only some embodiments in accordance with the disclosure and are not to be considered limiting of the scope of the disclosure.
图1是根据本公开第一实施例的结合RPA和AI实现IA的分类模型的训练方法的流程示意图;Figure 1 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure;
图2是根据本公开第二实施例的结合RPA和AI实现IA的分类模型的训练方法的流程示意图;Figure 2 is a schematic flowchart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure;
图3是根据本公开第三实施例的结合RPA和AI实现IA的文档分类方法的流程示意图;Figure 3 is a schematic flowchart of a document classification method for implementing IA by combining RPA and AI according to the third embodiment of the present disclosure;
图4是根据本公开第四实施例的结合RPA和AI实现IA的分类模型的训练装置的结构示意图;Figure 4 is a schematic structural diagram of a training device that combines RPA and AI to implement an IA classification model according to the fourth embodiment of the present disclosure;
图5是根据本公开第五实施例的结合RPA和AI实现IA的文档分类装置的结构示意图;Figure 5 is a schematic structural diagram of a document classification device that combines RPA and AI to implement IA according to the fifth embodiment of the present disclosure;
图6是用来实现本公开实施例的结合RPA和AI实现IA的分类模型的训练方法或结合RPA和AI实现IA的文档分类方法的电子设备的框图。6 is a block diagram of an electronic device used to implement a classification model training method that combines RPA and AI to implement IA or a document classification method that combines RPA and AI to implement IA according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开/公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本公开实施例/公开,而不能理解为对本公开实施例/公开的限制。Embodiments of the disclosure/disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the embodiments/disclosure of the present disclosure and are not to be construed as limitations to the embodiments/disclosure of the present disclosure.
参照下面的描述和附图,将清楚本公开/公开的实施例的这些和其他方面。在这些描述和附图中,具体公开了本公开/公开的实施例中的一些特定实施方式,来表示实施本公开/公开的实施例的原理的一些方式,但是应当理解,本公开/公开的实施例的范围不受此限制。相反,本公开/公开的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。These and other aspects of the present disclosure/disclosed embodiments will become apparent with reference to the following description and accompanying drawings. In these descriptions and drawings, some specific embodiments of the present disclosure/disclosure are specifically disclosed to represent some ways of implementing the principles of the present disclosure/disclosure embodiments, but it should be understood that the present disclosure/disclosure The scope of the embodiments is not limited in this way. On the contrary, the present disclosure/disclosed embodiments include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.
需要说明的是,本公开申请的技术方案中,所涉及的数据的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。It should be noted that the acquisition, storage and application of data involved in the technical solution of this disclosure application are in compliance with relevant laws and regulations and do not violate public order and good customs.
相关技术中,通常采用预训练文档理解模型比如LayoutLM模型对文档进行理解,再采用分类模型,基于理解结果对文档进行分类。而为了实现不同业务场景中的文档分类,通常是基于业务场景相关的训练数据,对预训练文档理解模型以及分类模型进行联合训练。也就是说,为了实现某个业务场景中的文档分类,不仅要对分类模型进行训练,还要对整个预训练文档理解模型做微调训练。而预训练文档理解模型的结构复杂,需要较多的训练数据才能实现对预训练文档理解模型的微调训练,整个训练过程耗时较长。 In related technologies, pre-trained document understanding models such as the LayoutLM model are usually used to understand the documents, and then a classification model is used to classify the documents based on the understanding results. In order to achieve document classification in different business scenarios, pre-trained document understanding models and classification models are usually jointly trained based on training data related to business scenarios. In other words, in order to implement document classification in a certain business scenario, not only the classification model must be trained, but also the entire pre-trained document understanding model must be fine-tuned. However, the structure of the pre-trained document understanding model is complex and requires more training data to achieve fine-tuning training of the pre-trained document understanding model. The entire training process takes a long time.
本公开实施例提供一种结合RPA和AI实现IA的分类模型的训练方法,无需对预训练文档理解模型进行微调训练,即可得到用于文档分类的分类模型。其中,该方法包括:获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。Embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA. A classification model for document classification can be obtained without fine-tuning the pre-trained document understanding model. Among them, the method includes: obtaining the position coordinates of at least one word included in each sample page in the plurality of sample pages, and obtaining the category to which each sample page belongs; inputting the position coordinates of each word in each sample page into the pre-training document understanding model to obtain the coding vector corresponding to each word in each sample page; based on the coding vector corresponding to each word in each sample page, obtain the coding vector corresponding to each sample page; use the coding vector corresponding to each sample page and its category as training Data is used to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
为了清楚说明本发明的各实施例,首先对本发明实施例中涉及到的技术名词进行解释说明。In order to clearly explain each embodiment of the present invention, technical terms involved in the embodiments of the present invention are first explained.
在本公开实施例/公开的描述中,术语“多个”指两个或两个以上。In the description of the embodiments/disclosure of the present disclosure, the term "plurality" means two or more.
在本公开实施例的描述中,“RPA机器人”,是指可结合AI技术和RPA技术,自动进行业务处理的软件机器人。RPA机器人拥有“连接器”和“无侵入”两个特性,通过模拟人类的操作方法,在不更改信息系统的前提下,使用非侵入的方式,将不同系统的数据进行提取、整合和连通。In the description of the embodiments of this disclosure, "RPA robot" refers to a software robot that can combine AI technology and RPA technology to automatically perform business processing. RPA robots have two characteristics: "connector" and "non-intrusion". By simulating human operation methods, they can extract, integrate and connect data from different systems in a non-intrusive way without changing the information system.
在本公开实施例的描述中,“文档”,为电子文档,其可以是对纸质文件进行扫描得到的PDF(Portable Document Format,便携式文档格式)格式的文档,也可以是在电脑、手机等智能设备中编辑形成的文档,本公开实施例对此不作限制。“目标文档”,为待进行分类的文档。“页面”,为文档中包括的页面。比如某个电子合同文档,可以一个或多个页面。“样本页面”,为用于进行模型训练的样本文档中包括的各页面。“第一页面”,为将训练数据划分为训练集和验证集后,训练集中包括的各页面。“第二页面”,为将训练数据划分为训练集和验证集后,验证集中包括的各页面。“目标页面”,为待进行分类的目标文档中包括的页面。In the description of the embodiments of the present disclosure, "document" is an electronic document, which can be a document in PDF (Portable Document Format, Portable Document Format) format obtained by scanning a paper document, or it can also be a document stored on a computer, mobile phone, etc. The embodiment of the present disclosure does not limit the documents edited and formed in the smart device. "Target document" is the document to be classified. "Page" is a page included in the document. For example, an electronic contract document can have one or more pages. "Sample pages" are the pages included in the sample documents used for model training. The "first page" refers to each page included in the training set after the training data is divided into a training set and a verification set. "Second page" refers to each page included in the verification set after dividing the training data into a training set and a verification set. "Target page" is a page included in the target document to be classified.
在本公开实施例的描述中,“文本片段”,为页面中的部分内容组成的片段,其中,文本片段可以是横向排列的一行或少于一行的文本,也可以是竖向排列的一列或少于一列的文本,本公开实施例对此不作限制。In the description of the embodiments of the present disclosure, a "text fragment" is a fragment composed of part of the content on the page. The text fragment may be one line or less than one line of text arranged horizontally, or it may be a column or text arranged vertically. For text that is less than one column, the embodiment of the present disclosure does not limit this.
在本公开实施例的描述中,“词对应的编码向量”,为用于表征该词的特征信息的向量,其中,词的特征信息比如包括词在页面中的位置等。“样本页面对应的编码向量”,为用于表征该样本页面的特征信息的向量,其中,样本页面的特征信息,比如包括样本页面中包括的所有词在页面中的位置等。“目标页面对应的编码向量”,为用于表征该目标页面的特征信息的向量,其中,目标页面的特征信息,比如包括目标页面中包括的所有词在页面中的位置等。 In the description of the embodiments of the present disclosure, the "encoding vector corresponding to a word" is a vector used to represent the feature information of the word, where the feature information of the word includes, for example, the position of the word on the page. "Encoding vector corresponding to the sample page" is a vector used to characterize the characteristic information of the sample page. The characteristic information of the sample page includes, for example, the positions of all words included in the sample page on the page. "Encoding vector corresponding to the target page" is a vector used to characterize the characteristic information of the target page, where the characteristic information of the target page includes, for example, the positions of all words included in the target page in the page, etc.
在本公开实施例的描述中,“预训练文档理解模型”,为经过预训练的用于理解文档的模型,比如LayoutLM模型(一个处理多模态信息(文本和布局信息)的预训练模型)、LayoutLM2.0模型等,本公开实施例对此不作限制,只需能够用于对文档中的页面进行编码,得到页面中各个词对应的编码向量即可。In the description of the embodiments of the present disclosure, "pre-trained document understanding model" refers to a pre-trained model for understanding documents, such as the LayoutLM model (a pre-trained model that processes multi-modal information (text and layout information)) , LayoutLM2.0 model, etc., the embodiments of the present disclosure do not limit this, as long as it can be used to encode the pages in the document, and obtain the encoding vector corresponding to each word in the page.
在本公开实施例的描述中,“预设类别”,为根据需要预先创建的文档可能所属的类别,比如可以设置为票据类别、合同类别等。“目标文档的所属类别”,为采用训练得到的目标分类模型,对待进行分类的目标文档所属的类别进行预测,得到的目标文档所属的类别。“样本页面的所属类别”,为样本页面实际所属的类别,比如票据类别、合同类别等。In the description of the embodiment of the present disclosure, the "preset category" refers to the category to which documents created in advance may belong according to needs. For example, it can be set to a bill category, a contract category, etc. "Category of the target document" refers to the category of the target document to be obtained by predicting the category of the target document to be classified using the trained target classification model. "Category of the sample page" refers to the category to which the sample page actually belongs, such as bill category, contract category, etc.
在本公开实施例的描述中,“分类模型”,为用于进行文档分类的AI神经网络模型,其结构可以根据需要设置。其中,分类模型的输入为文档中的页面对应的编码向量,分类模型的输出为预测得到的对应页面的所属类别,具体可以为页面属于一个或多个预设类别的置信度。In the description of the embodiments of the present disclosure, the "classification model" is an AI neural network model used for document classification, and its structure can be set as needed. The input of the classification model is the coding vector corresponding to the page in the document, and the output of the classification model is the predicted category of the corresponding page, which may be the confidence that the page belongs to one or more preset categories.
在本公开实施例的描述中,“置信度”,可以表示某个页面属于某种预设类别的可能性高低。比如,目标页面属于预设类别A的置信度,表示目标页面属于预设类别A的可能性高低。In the description of the embodiments of the present disclosure, "confidence" can indicate the likelihood that a certain page belongs to a certain preset category. For example, the confidence level that the target page belongs to the preset category A indicates the probability that the target page belongs to the preset category A.
在本公开实施例的描述中,“各目标页面属于预设类别的置信度的平均值”,即对各目标页面属于该预设类别的置信度求平均得到的值。In the description of the embodiment of the present disclosure, "the average value of the confidence that each target page belongs to the preset category" is the value obtained by averaging the confidence that each target page belongs to the preset category.
在本公开实施例的描述中,“文档处理平台”,为用于对文档进行智能处理的智能自动化平台。其中,智能文档处理(IDP)是智能自动化平台的核心能力之一。智能文档处理(Intelligent Document Processing,IDP)是基于光学字符识别(Optical Character Recognition,OCR)、计算机视觉(Computer Vision,CV)、自然语言处理(Natural Language Processing,NLP)、知识图谱(Knowledge Graph,KG)等AI技术,对各类文档进行识别、分类、要素提取、校验、比对、纠错等处理,帮助企业实现文档处理工作的智能化和自动化的新一代自动化技术。In the description of the embodiments of the present disclosure, the "document processing platform" is an intelligent automation platform for intelligently processing documents. Among them, intelligent document processing (IDP) is one of the core capabilities of the intelligent automation platform. Intelligent Document Processing (IDP) is based on Optical Character Recognition (OCR), Computer Vision (CV), Natural Language Processing (NLP), Knowledge Graph (KG) ) and other AI technologies, it can identify, classify, extract elements, verify, compare, and correct errors of various types of documents, and is a new generation of automation technology that helps enterprises realize the intelligence and automation of document processing.
在本公开实施例的描述中,“OCR(Optical Character Recognition,光学字符识别)”,具体是指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。In the description of the embodiments of the present disclosure, "OCR (Optical Character Recognition)" specifically refers to the electronic device checking the characters printed on the paper, determining their shape by detecting dark and light patterns, and then using the character recognition method The process of translating shapes into computer text; that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format through recognition software. Technology for further editing and processing by word processing software.
以下结合附图描述根据本公开/公开实施例的结合RPA和AI实现IA的分类模型的训练方法、结合RPA和AI实现IA的文档分类方法及装置、电子设备、车辆、计算机可读存储介质、计算机程序产品和计算机程序。 The following describes a training method for a classification model that combines RPA and AI to implement IA, a document classification method and device that combines RPA and AI to implement IA, electronic equipment, vehicles, and computer-readable storage media according to embodiments of the disclosure/disclosure, Computer program products and computer programs.
首先结合附图,对本公开实施例中的结合RPA和AI实现IA的分类模型的训练方法进行说明。First, the training method of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure will be described with reference to the accompanying drawings.
图1是本公开第一实施例的结合RPA和AI实现IA的分类模型的训练方法的流程图。如图1所示,该方法可包括步骤步骤101至步骤104。Figure 1 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the first embodiment of the present disclosure. As shown in Figure 1, the method may include steps 101 to 104.
步骤101,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别。Step 101: Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
需要说明的是,本公开实施例的结合RPA和AI实现IA的分类模型的训练方法,可以由结合RPA和AI实现IA的分类模型的训练装置执行。以下将结合RPA和AI实现IA的分类模型的训练装置简称为训练装置。其中,该训练装置可以由软件和/或硬件实现,该训练装置可以为电子设备,或者也可以配置在电子设备中,以实现对用于文档分类的分类模型的训练。其中,该电子设备可以包括但不限于终端设备、服务器等,该实施例对电子设备不作具体限定。It should be noted that the training method of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a training device that combines RPA and AI to implement the classification model of IA. Hereinafter, the training device that combines RPA and AI to implement the IA classification model will be referred to as the training device. The training device may be implemented by software and/or hardware, and the training device may be an electronic device, or may be configured in an electronic device to implement training of a classification model for document classification. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
其中,样本页面包括的词,为对样本页面中文本片段进行分词得到的词(即token)。其中,可以基于预设词表和规则对文本片段进行分词。比如,对于中文,可以逐字切分,例如对于文本片段“1 23”,分词得到的词为“1”和“23”,对于文本片段“张三的”,分词得到的词为“张三”和“的”;对于英文,可以切分为词干和词缀的子词(sub-word),例如,对于文本片段“working”,分词得到的词为“work”和“ing”。Among them, the words included in the sample page are words (that is, tokens) obtained by segmenting the text fragments in the sample page. Among them, text fragments can be segmented based on preset vocabulary lists and rules. For example, for Chinese, it can be segmented word by word. For example, for the text segment "1 23", the words obtained by segmentation are "1" and "23". For the text segment "Zhang San's", the words obtained by segmentation are "Zhang San" " and "of"; for English, it can be divided into sub-words (sub-words) of stems and affixes. For example, for the text fragment "working", the words obtained by segmentation are "work" and "ing".
其中,词的位置坐标,用于表示词在页面(本实施例中指样本页面)中的位置。比如,词的位置坐标,可以包括词在以样本页面的左上角为原点的坐标系中的x轴坐标和y轴坐标。The position coordinates of the word are used to represent the position of the word in the page (referring to the sample page in this embodiment). For example, the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the sample page as the origin.
其中,对于每个样本页面,在样本页面包括多个词的情况下,为了减少运算量,可以仅获取有限数量的词的位置坐标,用于进行后续的模型训练。比如,假设预先设置词的数量为128,则对于每个样本页面,最多可以获取128个词的位置坐标。For each sample page, if the sample page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent model training. For example, assuming that the number of words is preset to 128, for each sample page, the position coordinates of up to 128 words can be obtained.
步骤102,将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量。Step 102: Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
在一些实施例中,对于各样本页面中各词,可以将该词的位置坐标输入预训练文档理解模型,预训练文档理解模型即可输出该词的编码向量,从而训练装置可以得到该词对应的编码向量。In some embodiments, for each word in each sample page, the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, so that the training device can obtain the corresponding word encoding vector.
步骤103,基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量。Step 103: Obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.
在一些实施例中,对于每个样本页面,在该样本页面中包括一个词的情况下,可以将该词对应的编码向量,确定为该样本页面对应的编码向量;在该样本页面中包括多个词的 情况下,可以确定该多个词对应的编码向量的平均值,并将该平均值作为该样本页面对应的编码向量。In some embodiments, for each sample page, if the sample page includes a word, the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the sample page; the sample page includes multiple word In this case, the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the sample page.
步骤104,将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。Step 104: Use the coding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.
在一些实施例中,可以预先构建初始的分类模型。其中,假设各样本页面对应的编码向量的编码长度为L,预设类别的数量为M,则初始的分类模型,可以为一个L*M维的矩阵。其中,L为大于1的整数,M为大于0的整数。通过将一个样本页面对应的1*L维的编码向量,输入该分类模型,可以获取一个1*M维的向量,该1*M维的向量中各元素,表示该样本页面属于M个预设类别的置信度。In some embodiments, an initial classification model may be pre-constructed. Among them, assuming that the encoding length of the encoding vector corresponding to each sample page is L and the number of preset categories is M, the initial classification model can be an L*M-dimensional matrix. Among them, L is an integer greater than 1, and M is an integer greater than 0. By inputting the 1*L-dimensional encoding vector corresponding to a sample page into the classification model, a 1*M-dimensional vector can be obtained. Each element in the 1*M-dimensional vector indicates that the sample page belongs to M presets. Confidence of the category.
进而可以将各样本页面对应的编码向量作为分类模型的输入,将各样本页面的所属类别作为标签,对初始的分类模型进行监督训练,以得到目标分类模型。Then, the encoding vector corresponding to each sample page can be used as the input of the classification model, and the category of each sample page can be used as the label, and the initial classification model can be supervised and trained to obtain the target classification model.
其中,该目标分类模型,可以用于实现对文档的分类。从而在实际业务场景中,可以先使用该目标分类模型对文档进行分类,再调用对应类别的信息抽取模型对文档进行信息抽取,进而基于抽取出的信息实现进一步的业务处理。Among them, the target classification model can be used to classify documents. Therefore, in actual business scenarios, you can first use the target classification model to classify documents, and then call the information extraction model of the corresponding category to extract information from the documents, and then implement further business processing based on the extracted information.
可以理解的是,本公开实施例中,将预训练文档理解模型作为不同业务场景中通用的编码器单独使用,用来获取样本页面对应的编码向量,在训练过程中不对预训练文档理解模型进行训练,仅对分类模型进行训练。由于分类模型,无需用于生成各样本页面对应的编码向量,结构简单,从而只需要使用少量的训练数据进行训练,即可得到能够准确进行文档分类的分类模型,且训练过程耗时短。由此,可以在不影响文档分类的准确性的前提下,提高分类模型的训练速度,且减少训练过程中所需的数据量。另外,分类模型结构简单,空间占用少,从而方便进行部署。It can be understood that in the embodiments of the present disclosure, the pre-trained document understanding model is used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the sample page, and the pre-trained document understanding model is not performed during the training process. Training, only the classification model is trained. Because the classification model does not need to be used to generate encoding vectors corresponding to each sample page, and has a simple structure, it only needs to use a small amount of training data for training to obtain a classification model that can accurately classify documents, and the training process is short. As a result, the training speed of the classification model can be increased and the amount of data required during the training process can be reduced without affecting the accuracy of document classification. In addition, the classification model has a simple structure and takes up little space, making it easy to deploy.
综上,本公开实施例提供的结合RPA和AI实现IA的分类模型的训练方法,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。In summary, embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs; Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, obtain the encoding corresponding to each sample page Vector; use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain the target classification model for document classification. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
下面结合图2,对本公开实施例提供的结合RPA和AI实现IA的分类模型的训练方法进行进一步说明。The training method of the classification model that combines RPA and AI to implement IA provided by the embodiment of the present disclosure will be further described below with reference to FIG. 2 .
图2是根据本公开第二实施例的结合RPA和AI实现IA的分类模型的训练方法的流程图,如图2所示,该方法包括步骤201至步骤206。 Figure 2 is a flow chart of a training method for a classification model that combines RPA and AI to implement IA according to the second embodiment of the present disclosure. As shown in Figure 2, the method includes steps 201 to 206.
步骤201,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别。Step 201: Obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
其中,对于每个样本页面,在样本页面包括多个词的情况下,为了减少运算量,可以仅获取有限数量的词的位置坐标,用于进行后续的模型训练。比如,假设预先设置词的数量为128,则对于每个样本页面,最多可以获取128个词的位置坐标。For each sample page, if the sample page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent model training. For example, assuming that the number of words is preset to 128, for each sample page, the position coordinates of up to 128 words can be obtained.
在一些实施例中,可以通过以下方式,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,即步骤201可以包括:对于每个样本页面,获取样本页面的光学字符识别OCR识别信息;基于样本页面的OCR识别信息,获取样本页面中至少一个文本片段的文本内容;对各文本片段的文本内容进行切词,得到各文本片段包括的至少一个词;获取各文本片段所占区域的位置坐标;基于各文本片段所占区域的位置坐标,以及其中各词在对应文本片段中的位置,获取各词的位置坐标。In some embodiments, the position coordinates of at least one word included in each sample page in the plurality of sample pages can be obtained in the following manner, that is, step 201 can include: for each sample page, obtain the optical character recognition OCR of the sample page. Identification information; based on the OCR identification information of the sample page, obtain the text content of at least one text fragment in the sample page; segment the text content of each text fragment to obtain at least one word included in each text fragment; obtain the proportion of each text fragment The position coordinates of the area; based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment, the position coordinates of each word are obtained.
其中,每个文本片段所占区域通常为矩形,文本片段所占区域的位置坐标,可以包括文本片段所占区域的左上角顶点和右下角顶点的位置坐标,或者右上角顶点和左下角顶点的位置坐标。词的位置坐标,也可以包括词所占区域的左上角顶点和右下角顶点的位置坐标,或者右上角顶点和左下角顶点的位置坐标。The area occupied by each text fragment is usually a rectangle. The position coordinates of the area occupied by the text fragment may include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the text fragment, or the position coordinates of the upper right corner vertex and the lower left corner vertex. Position coordinates. The position coordinates of a word may also include the position coordinates of the upper left corner vertex and the lower right corner vertex of the area occupied by the word, or the position coordinates of the upper right corner vertex and the lower left corner vertex.
在一些实施例中,对于每个样本页面,可以预先采用OCR识别技术进行识别,以得到该样本页面的OCR识别信息,再从样本页面的OCR识别信息中,获取至少一个文本片段的文本内容,进而基于预设词表和规则,对各文本片段的文本内容进行切词,得到各文本片段包括的至少一个词。其中,对于中文可以逐字切分;对于英文,可以切分为词干和词缀的子词(sub-word)。In some embodiments, for each sample page, OCR recognition technology can be used to identify it in advance to obtain the OCR identification information of the sample page, and then obtain the text content of at least one text fragment from the OCR identification information of the sample page, Then, based on the preset vocabulary list and rules, the text content of each text segment is segmented to obtain at least one word included in each text segment. Among them, for Chinese, it can be segmented word by word; for English, it can be segmented into sub-words (sub-words) of stems and affixes.
另外,对于每个样本页面,可以获取样本页面的宽度和高度,以及其中各文本片段在样本页面中的位置,进而基于样本页面的宽度和高度,以及各文本片段在样本页面中的位置,确定各文本片段所占区域的位置坐标,比如左上角顶点和右下角顶点的位置坐标(或者右上角顶点和左下角顶点的位置坐标)。In addition, for each sample page, the width and height of the sample page, as well as the position of each text fragment in the sample page, can be obtained, and then based on the width and height of the sample page, and the position of each text fragment in the sample page, determine The position coordinates of the area occupied by each text fragment, such as the position coordinates of the upper left corner vertex and the lower right corner vertex (or the position coordinates of the upper right corner vertex and the lower left corner vertex).
进而对于每个样本页面,可以基于其中各文本片段所占区域的位置坐标,以及其中各词在对应文本片段中的位置,获取各词的位置坐标。Furthermore, for each sample page, the position coordinates of each word can be obtained based on the position coordinates of the area occupied by each text fragment and the position of each word in the corresponding text fragment.
下面对基于某个样本页面中某个文本片段所占区域的位置坐标,以及其中各词在该文本片段中的位置,获取该文本片段中各词的位置坐标的过程进行说明。其中,假设样本页面的左上角为坐标系的原点,文本片段所占区域的位置坐标,包括左上角顶点的x轴坐标和y轴坐标(x1,y1),以及右下角顶点的x轴坐标和y轴坐标(x2,y2)。词的位置坐标,包括词所占区域的左上角顶点的x轴坐标和y轴坐标(x3,y3),以及右下角顶点的x轴坐标和y轴坐标(x4,y4)。 The following describes the process of obtaining the position coordinates of each word in the text fragment based on the position coordinates of the area occupied by a certain text fragment in a sample page and the position of each word in the text fragment. Among them, assuming that the upper left corner of the sample page is the origin of the coordinate system, the position coordinates of the area occupied by the text fragment include the x-axis coordinates and y-axis coordinates (x1, y1) of the upper left corner vertex, and the x-axis coordinates of the lower right corner vertex and y-axis coordinate (x2,y2). The position coordinates of the word include the x-axis coordinate and y-axis coordinate (x3, y3) of the upper-left corner vertex of the area occupied by the word, and the x-axis coordinate and y-axis coordinate (x4, y4) of the lower-right corner vertex.
由于文本片段可能是横向排列的一行或少于一行的文本,也可能是竖向排列的一列或少于一列的文本,则可以先确定文本片段为横向排列还是竖向排列。其中,可以判断文本片段对应的y2-y1是否小于A(x2-x1),其中,A可以根据需要设置,比如设置为1.5。在y2-y1小于A(x2-x1)的情况下,可以确定文本片段为横向排列;在y2-y1不小于A(x2-x1)的情况下,可以确定文本片段为竖向排列。Since the text fragment may be one line or less than one line of text arranged horizontally, or one or less than one column of text arranged vertically, you can first determine whether the text fragment is arranged horizontally or vertically. Among them, it can be judged whether y2-y1 corresponding to the text fragment is less than A(x2-x1), where A can be set as needed, for example, set to 1.5. When y2-y1 is less than A(x2-x1), it can be determined that the text fragments are arranged horizontally; when y2-y1 is not less than A(x2-x1), it can be determined that the text fragments are arranged vertically.
对于横向排列的文本片段,可以基于该文本片段中各词的长度比例,以及x2-x1的值,获取各词的宽度。进而对于该文本片段中左边第一个词,将x1作为该词的左上角顶点的x轴坐标x3,将x3+第一个词的宽度,作为第一个词的右下角顶点的x轴坐标x4。对于该文本片段中其它词,可以将x1与该词左边所有词的累加宽度相加,作为该词的左上角顶点的x轴坐标x3,将x3+该词的宽度,作为该词的右下角顶点的x轴坐标x4。另外,对于该文本片段中各词,可以将y1作为各词的左上角顶点的y轴坐标y3,将y2作为各词的右下角顶点的y轴坐标y4。For horizontally arranged text fragments, the width of each word can be obtained based on the length ratio of each word in the text fragment and the value of x2-x1. Then, for the first word on the left in the text fragment, x1 is used as the x-axis coordinate x3 of the upper-left corner vertex of the word, and x3 + the width of the first word is used as the x-axis coordinate x4 of the lower-right corner vertex of the first word. . For other words in this text fragment, you can add x1 to the cumulative width of all words to the left of the word, and use it as the x-axis coordinate x3 of the upper left corner vertex of the word. Take x3 + the width of the word as the lower right corner vertex of the word. The x-axis coordinate x4. In addition, for each word in the text fragment, y1 can be regarded as the y-axis coordinate y3 of the upper-left corner vertex of each word, and y2 can be regarded as the y-axis coordinate y4 of the lower-right corner vertex of each word.
对于竖向排列的文本片段,可以基于该文本片段中各词的长度比例,以及y2-y1的值,获取各词的高度。进而对于该文本片段中上边第一个词,将y1作为该词的左上角顶点的y轴坐标y3,将y3+第一个词的高度,作为第一个词的右下角顶点的y轴坐标y4。对于该文本片段中其它词,可以将y1与该词上边所有词的累加高度相加,作为该词的左上角顶点的y轴坐标y3,将y3+该词的高度,作为该词的右下角顶点的y轴坐标y4。另外,对于该文本片段中各词,可以将x1作为各词的左上角顶点的x轴坐标x3,将x2作为各词的右下角顶点的x轴坐标x4。For vertically arranged text fragments, the height of each word can be obtained based on the length ratio of each word in the text fragment and the value of y2-y1. Then, for the first word at the top of the text fragment, use y1 as the y-axis coordinate y3 of the upper-left corner vertex of the word, and use y3 + the height of the first word as the y-axis coordinate y4 of the lower-right corner vertex of the first word. . For other words in the text fragment, you can add y1 to the cumulative height of all words above the word to use it as the y-axis coordinate y3 of the upper left corner vertex of the word, and use y3 + the height of the word as the lower right corner vertex of the word. The y-axis coordinate y4. In addition, for each word in the text fragment, x1 can be regarded as the x-axis coordinate x3 of the upper-left corner vertex of each word, and x2 can be regarded as the x-axis coordinate x4 of the lower-right corner vertex of each word.
举例来说,假设A为1.5,文本片段为“1 23”,对该文本片段的文本内容进行切词得到的词为“1”和“23”。文本片段“1 23”所占区域的位置坐标包括:(x1,y1)=(2,2)和(x2,y2)=(8,4),由于y2-y1小于1.5(x2-x1),可以确定文本片段“1 23”为横向排列。由于“1”和“23”的长度比例为1:2,则可以得到“1”的宽度为(8-2)*1/3=2,“23”的宽度为(8-2)*2/3=4。按照上述方式,可以确定“1”的位置坐标包括:(x3,y3)=(2,2)、(x4,y4)=(4,4),“23”的位置坐标包括:(x3,y3)=(4,2)、(x4,y4)=(8,4)。For example, assuming that A is 1.5 and the text fragment is "1 23", the words obtained by segmenting the text content of the text fragment are "1" and "23". The position coordinates of the area occupied by the text fragment "1 23" include: (x1, y1) = (2, 2) and (x2, y2) = (8, 4). Since y2-y1 is less than 1.5 (x2-x1), It can be determined that the text fragment "1 23" is arranged horizontally. Since the length ratio of "1" and "23" is 1:2, it can be obtained that the width of "1" is (8-2)*1/3=2, and the width of "23" is (8-2)*2 /3=4. According to the above method, it can be determined that the position coordinates of "1" include: (x3, y3) = (2, 2), (x4, y4) = (4, 4), and the position coordinates of "23" include: (x3, y3 )=(4,2), (x4,y4)=(8,4).
在一些实施例中,训练装置获取的多个样本文档,可以是RPA机器人发送的。即,对于每个样本页面,获取样本页面的光学字符识别OCR识别信息之前,还可以包括:获取RPA机器人发送的多个样本文档。In some embodiments, the multiple sample documents obtained by the training device may be sent by the RPA robot. That is, for each sample page, before obtaining the optical character recognition OCR identification information of the sample page, it may also include: obtaining multiple sample documents sent by the RPA robot.
比如,训练装置可以配置在文档处理平台中,且文档处理平台可以提供上传接口,从而在用户需要训练生成目标分类模型时,可以基于RPA机器人将各样本文档通过上传接口进行上传,从而文档处理平台中的训练装置可以获取RPA机器人上传的多个样本文档。由 此,通过采用RPA机器人将多个样本页面上传至文档处理平台,使得训练装置能够结合RPA机器人自动获取样本页面,从而减少训练分类模型的人工成本。For example, the training device can be configured in the document processing platform, and the document processing platform can provide an upload interface, so that when the user needs to train and generate a target classification model, each sample document can be uploaded through the upload interface based on the RPA robot, so that the document processing platform The training device in can obtain multiple sample documents uploaded by the RPA robot. Depend on Therefore, by using the RPA robot to upload multiple sample pages to the document processing platform, the training device can automatically obtain the sample pages in combination with the RPA robot, thereby reducing the labor cost of training the classification model.
步骤202,将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量。Step 202: Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
步骤203,基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量。Step 203: Based on the encoding vector corresponding to each word in each sample page, obtain the encoding vector corresponding to each sample page.
其中,步骤202-203的具体实现过程及原理,可以参考上述实施例的描述,此处不再赘述。For the specific implementation process and principles of steps 202-203, reference can be made to the description of the above embodiments and will not be described again here.
步骤204,将训练数据划分为训练集和验证集,训练集包括多个第一页面对应的编码向量,验证集包括多个第二页面对应的编码向量,各第一页面以及各第二页面采用所属类别进行标注。Step 204: Divide the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages, and the verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page adopt Mark the category it belongs to.
其中,训练集中包括的第一页面和验证集中包括的第二页面的数量比例,可以根据需要任意设置,比如为4:1,本公开实施例对此不作限制。The number ratio of the first page included in the training set to the second page included in the verification set can be set arbitrarily as needed, for example, 4:1, and the embodiment of the present disclosure does not limit this.
步骤205,基于各第一页面对应的编码向量以及所属类别,对初始的分类模型进行多轮训练,以得到各轮训练后的候选分类模型。Step 205: Perform multiple rounds of training on the initial classification model based on the encoding vector corresponding to each first page and its category to obtain a candidate classification model after each round of training.
其中,对初始的分类模型进行训练的轮数,可以根据需要任意设置,本公开实施例对此不作限制。The number of rounds for training the initial classification model can be set arbitrarily as needed, and the embodiments of the present disclosure do not limit this.
在一些实施例中,以训练轮数为N为例,其中N为大于1的整数,可以将训练集划分为N个子训练集,并基于第一个子训练集,对初始的分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,再基于下一个子训练集,对第一轮训练后的候选分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,再基于下一个子训练集,对第二轮训练后的候选分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,以此类推,从而基于N个子训练集,得到N轮训练后的候选分类模型。In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, the training set can be divided into N sub-training sets, and based on the first sub-training set, the initial classification model is After a round of iterative training, the candidate classification model after this round of training is obtained, and then based on the next sub-training set, a round of iterative training is performed on the candidate classification model after the first round of training, and the candidate classification model after this round of training is obtained, and then based on the next sub-training set In the next sub-training set, perform one round of iterative training on the candidate classification model after the second round of training to obtain the candidate classification model after this round of training, and so on, so as to obtain the candidates after N rounds of training based on N sub-training sets. Classification model.
在一些实施例中,以训练轮数为N为例,其中N为大于1的整数,也可以基于训练集,对初始的分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,再基于该训练集,对第一轮训练后的候选分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,再基于该训练集,对第二轮训练后的候选分类模型进行一轮迭代训练,得到该轮训练后的候选分类模型,以此类推,从而基于该训练集,得到N轮训练后的候选分类模型。In some embodiments, taking the number of training rounds as N as an example, where N is an integer greater than 1, a round of iterative training can also be performed on the initial classification model based on the training set to obtain a candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the first round of training to obtain the candidate classification model after this round of training. Then based on this training set, perform an iterative training on the candidate classification model after the second round of training. After one round of iterative training, the candidate classification model after this round of training is obtained, and so on, based on the training set, the candidate classification model after N rounds of training is obtained.
步骤206,基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型。Step 206: Based on the coding vector corresponding to each second page and its category, select a target classification model for document classification from the candidate classification models after each round of training.
在一些实施例中,步骤206可以通过以下方式实现:对于每轮训练后的候选分类模型,将各第二页面对应的编码向量,输入该候选分类模型中,以得到该候选分类模型预测的各第二页面属于多个预设类别的置信度,并基于各第二页面属于多个预设类别的置信度,以 及各第二页面的所属类别,确定该候选分类模型对应的损失值;基于各轮训练后的候选分类模型对应的损失值,从各轮训练后的候选分类模型中选取用于文档分类的目标分类模型。In some embodiments, step 206 can be implemented in the following manner: for the candidate classification model after each round of training, the encoding vector corresponding to each second page is input into the candidate classification model to obtain each prediction of the candidate classification model. The confidence that the second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model; based on the loss value corresponding to the candidate classification model after each round of training, select the target for document classification from the candidate classification model after each round of training Classification model.
在一些实施例中,可以将某个候选分类模型预测的各第二页面属于多个预设类别的置信度,以及各第二页面的所属类别,代入交叉熵损失函数,以确定该候选分类模型对应的损失值。In some embodiments, the confidence that each second page predicted by a certain candidate classification model belongs to multiple preset categories and the category to which each second page belongs can be substituted into the cross-entropy loss function to determine the candidate classification model. the corresponding loss value.
其中,交叉熵损失函数可以为公式(1)所示。
Among them, the cross-entropy loss function can be as shown in formula (1).
其中,Lce表示损失值。N表示验证集中包括的第二页面的数量。C表示预设类别的数量,也称为类别数。为标签,是一个符号函数,在第i个第二页面的所属类别为c时,为1,否则为0。表示第i个第二页面属于预设类别c的置信度(也称为预测概率)。Among them, L ce represents the loss value. N represents the number of second pages included in the validation set. C represents the number of preset categories, also called the number of categories. is a label, which is a symbolic function. When the category of the i-th second page is c, is 1, otherwise is 0. Indicates the confidence (also called prediction probability) that the i-th second page belongs to the preset category c.
在一些实施例中,可以将各轮训练后的候选分类模型中,对应的损失值最低的候选分类模型,确定为目标分类模型。由此,可以将各轮训练后的候选分类模型中,预测准确度最高的模型确定为目标分类模型。In some embodiments, the candidate classification model with the lowest corresponding loss value among the candidate classification models after each round of training may be determined as the target classification model. Therefore, among the candidate classification models after each round of training, the model with the highest prediction accuracy can be determined as the target classification model.
综上,本公开实施例提供的结合RPA和AI实现IA的分类模型的训练方法,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别,将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量,基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量,将训练数据划分为训练集和验证集,训练集包括多个第一页面对应的编码向量,验证集包括多个第二页面对应的编码向量,各第一页面以及各第二页面采用所属类别进行标注,基于各第一页面对应的编码向量以及所属类别,对初始的分类模型进行多轮训练,以得到各轮训练后的候选分类模型,基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。In summary, embodiments of the present disclosure provide a training method for a classification model that combines RPA and AI to implement IA, obtains the position coordinates of at least one word included in each sample page among multiple sample pages, and obtains the category to which each sample page belongs. Input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page. Based on the encoding vector corresponding to each word in each sample page, obtain the encoding corresponding to each sample page. Vector, the training data is divided into a training set and a verification set. The training set includes multiple encoding vectors corresponding to the first page, and the verification set includes multiple encoding vectors corresponding to the second page. Each first page and each second page use their own Categories are marked, and based on the coding vectors corresponding to each first page and the categories they belong to, the initial classification model is trained for multiple rounds to obtain candidate classification models after each round of training, based on the coding vectors corresponding to each second page and the categories they belong to. , select the target classification model for document classification from the candidate classification models after each round of training. As a result, the training of the classification model for document classification is achieved, the training speed of the classification model is improved, and the amount of data required during the training process is reduced.
基于上述实施例,本公开实施例还提供一种结合RPA和AI实现IA的文档分类方法。下面结合图3,对本公开实施例提供的结合RPA和AI实现IA的文档分类方法进行说明。Based on the above embodiments, embodiments of the present disclosure also provide a document classification method that combines RPA and AI to implement IA. The document classification method for implementing IA by combining RPA and AI provided by the embodiment of the present disclosure will be described below with reference to FIG. 3 .
图3是根据本公开第三实施例的结合RPA和AI实现IA的文档分类方法的流程图,如图3所示,该方法包括步骤301至步骤307。Figure 3 is a flow chart of a document classification method that combines RPA and AI to implement IA according to the third embodiment of the present disclosure. As shown in Figure 3, the method includes steps 301 to 307.
步骤301,获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面。Step 301: Obtain the target document sent by the RPA robot. The target document includes at least one target page.
需要说明的是,本公开实施例的结合RPA和AI实现IA的文档分类方法,可以由结合RPA和AI实现IA的文档分类装置执行。其中,该结合RPA和AI实现IA的文档分类装置可以由软件和/或硬件实现,该结合RPA和AI实现IA的文档分类装置可以为电子设 备,或者也可以配置在电子设备中,以实现对文档的分类。其中,该电子设备可以包括但不限于终端设备、服务器等,该实施例对电子设备不作具体限定。It should be noted that the document classification method that combines RPA and AI to implement IA in the embodiment of the present disclosure can be executed by a document classification device that combines RPA and AI to implement IA. Among them, the document classification device that combines RPA and AI to implement IA can be implemented by software and/or hardware. The document classification device that combines RPA and AI to implement IA can be an electronic device. equipment, or can also be configured in electronic equipment to classify documents. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
在一些实施例中,结合RPA和AI实现IA的文档分类装置可以配置在文档处理平台中,且文档处理平台可以提供上传接口。从而在用户需要对某个目标文档进行分类时,可以基于RPA机器人将目标文档通过上传接口进行上传,从而文档处理平台中的结合RPA和AI实现IA的文档分类装置可以获取RPA机器人上传的目标文档。In some embodiments, a document classification device that combines RPA and AI to implement IA can be configured in a document processing platform, and the document processing platform can provide an upload interface. Therefore, when the user needs to classify a certain target document, the target document can be uploaded through the upload interface based on the RPA robot, so that the document classification device in the document processing platform that combines RPA and AI to implement IA can obtain the target document uploaded by the RPA robot. .
步骤302,获取各目标页面包括的至少一个词的位置坐标。Step 302: Obtain the position coordinates of at least one word included in each target page.
其中,目标页面包括的词,为对目标页面中文本片段进行分词得到的词(即token)。其中,可以基于预设词表和规则对文本片段进行分词。Among them, the words included in the target page are words (that is, tokens) obtained by segmenting the text fragments in the target page. Among them, text fragments can be segmented based on preset vocabulary lists and rules.
词的位置坐标,用于表示词在页面(本实施例中指目标页面)中的位置。比如,词的位置坐标,可以包括词在以目标页面的左上角为原点的坐标系中的x轴坐标和y轴坐标。The position coordinates of the word are used to represent the position of the word in the page (in this embodiment, the target page). For example, the position coordinates of a word may include the x-axis coordinate and y-axis coordinate of the word in a coordinate system with the upper left corner of the target page as the origin.
其中,对于每个目标页面,在目标页面包括多个词的情况下,为了减少运算量,可以仅获取有限数量的词的位置坐标,用于进行后续的文档分类。比如,假设预先设置词的数量为128,则对于每个目标页面,最多可以获取128个词的位置坐标。Among them, for each target page, when the target page includes multiple words, in order to reduce the amount of calculation, only the position coordinates of a limited number of words can be obtained for subsequent document classification. For example, assuming that the number of words is preset to 128, for each target page, the position coordinates of up to 128 words can be obtained.
其中,获取各目标页面包括的至少一个词的位置坐标的方式,可以参考上述实施例中获取样本页面中包括的至少一个词的位置坐标的方式,此处不再赘述。The method of obtaining the position coordinates of at least one word included in each target page may refer to the method of obtaining the position coordinates of at least one word included in the sample page in the above embodiment, which will not be described again here.
步骤303,将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量。Step 303: Input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.
在一些实施例中,对于各目标页面中各词,可以将该词的位置坐标输入预训练文档理解模型,预训练文档理解模型即可输出该词的编码向量,从而结合RPA和AI实现IA的文档分类装置可以得到该词对应的编码向量。In some embodiments, for each word in each target page, the position coordinates of the word can be input into the pre-trained document understanding model, and the pre-trained document understanding model can output the encoding vector of the word, thereby realizing IA by combining RPA and AI. The document classification device can obtain the encoding vector corresponding to the word.
步骤304,基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量。Step 304: Obtain the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.
在一些实施例中,对于每个目标页面,在该目标页面中包括一个词的情况下,可以将该词对应的编码向量,确定为该目标页面对应的编码向量;在该目标页面中包括多个词的情况下,可以确定该多个词对应的编码向量的平均值,并将该平均值作为该目标页面对应的编码向量。In some embodiments, for each target page, if the target page includes a word, the encoding vector corresponding to the word can be determined as the encoding vector corresponding to the target page; the target page includes multiple In the case of words, the average value of the encoding vectors corresponding to the multiple words can be determined, and the average value can be used as the encoding vector corresponding to the target page.
步骤305,将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度。Step 305: Input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories.
其中,目标分类模型通过上述任一实施例所示的结合RPA和AI实现IA的分类模型的训练方法训练得到。Among them, the target classification model is trained through the training method of the classification model that combines RPA and AI to implement IA as shown in any of the above embodiments.
其中,对于任一目标页面,该目标页面属于多个预设类别的置信度,可以表示目标页面属于各预设类别的可能性高低。 Among them, for any target page, the confidence that the target page belongs to multiple preset categories can indicate the probability that the target page belongs to each preset category.
在一些实施例中,通过上述任一实施例所示的结合RPA和AI实现IA的分类模型的训练方法,训练得到的目标分类模型,可以部署在文档处理平台中,从而结合RPA和AI实现IA的文档分类装置可以将各目标页面对应的编码向量,输入文档处理平台中部署的目标分类模型,以得到各目标页面属于多个预设类别的置信度。In some embodiments, through the training method of a classification model that combines RPA and AI to implement IA as shown in any of the above embodiments, the trained target classification model can be deployed in a document processing platform to implement IA by combining RPA and AI. The document classification device can input the encoding vector corresponding to each target page into the target classification model deployed in the document processing platform to obtain the confidence that each target page belongs to multiple preset categories.
步骤306,对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值。Step 306: For each preset category, determine the average value of the confidence that each target page belongs to the preset category.
在一些实施例中,可以将各目标页面属于同一预设类别的置信度求和,再取平均,从而得到各目标页面属于该同一预设类别的置信度的平均值。In some embodiments, the confidence scores of each target page belonging to the same preset category may be summed and then averaged, thereby obtaining an average value of the confidence scores of each target page belonging to the same preset category.
步骤307,基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。Step 307: Determine the category to which the target document belongs from each preset category based on the average value corresponding to each preset category.
在一些实施例中,可以将各预设类别中,对应的平均值最大的预设类别,确定为目标文档的所属类别。In some embodiments, the preset category with the largest corresponding average value among each preset category may be determined as the category to which the target document belongs.
举例来说,假设预设类别包括类别1和类别2。目标文档包括10个目标页面。则通过步骤305以及之前的步骤,可以得到10个目标页面各自属于类别1的置信度和各自属于类别2的置信度。进而可以将10个目标页面属于类别1的置信度求和,再取平均,得到10个目标页面属于类别1的置信度的平均值,并将10个目标页面属于类别2的置信度求和,再取平均,得到10个目标页面属于类别2的置信度的平均值。假设10个目标页面属于类别1的置信度的平均值,大于10个目标页面属于类别2的置信度的平均值,则可以确定目标文档的所属类别为类别1。For example, assume that the default categories include category 1 and category 2. The target document includes 10 target pages. Then through step 305 and the previous steps, the confidence that each of the 10 target pages belongs to category 1 and the confidence that each of the 10 target pages belongs to category 2 can be obtained. Then you can sum up the confidence that the 10 target pages belong to category 1, and then average it to get the average of the confidence that the 10 target pages belong to category 1, and sum the confidence that the 10 target pages belong to category 2, Then take the average to get the average of the confidence that the 10 target pages belong to category 2. Assuming that the average confidence level of 10 target pages belonging to category 1 is greater than the average confidence level of 10 target pages belonging to category 2, it can be determined that the category to which the target document belongs is category 1.
本公开实施例中的预训练文档理解模型,对于各业务场景来说是通用的,无需训练,可以用来作为不同业务场景中通用的编码器单独使用,用来获取目标页面对应的编码向量。目标分类模型,无需用于生成各目标页面对应的编码向量,结构简单,在训练过程中只需要使用少量的训练数据进行训练,训练过程耗时短,且训练得到的目标分类模型的预测效果不受影响,仍然可以实现对文档的准确分类。The pre-trained document understanding model in the embodiment of the present disclosure is universal for various business scenarios and does not require training. It can be used alone as a general encoder in different business scenarios to obtain the encoding vector corresponding to the target page. The target classification model does not need to be used to generate encoding vectors corresponding to each target page. It has a simple structure and only needs to use a small amount of training data for training. The training process is short and the prediction effect of the trained target classification model is not good. Affected, accurate classification of documents can still be achieved.
综上,本公开实施例提供的结合RPA和AI实现IA的文档分类方法,获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面,获取各目标页面包括的至少一个词的位置坐标,将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量,基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量,将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度,对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值,基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。由此,实现了结合使用少量的训练数据快速训练得到的目标分类模型,以及预训练文档理解模型,对目标文档进行准确分类。且通过采用目标分类模型,对RPA机器人发送的目标文档进行IA的分类,减少了文档分类所需的人工成本,提高了文档分类的效率。 In summary, the document classification method that combines RPA and AI to implement IA provided by the embodiments of the present disclosure obtains the target document sent by the RPA robot. The target document includes at least one target page, and obtains the position coordinates of at least one word included in each target page. Input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, obtain the encoding corresponding to each target page. Vector, input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, determine the average confidence level that each target page belongs to the preset category. value, based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.
为了实现上述实施例,本公开实施例还提出了一种结合RPA和AI实现IA的分类模型的训练装置。图4是根据本公开第四实施例的结合RPA和AI实现IA的分类模型的训练装置的结构示意图。In order to implement the above embodiments, embodiments of the present disclosure also propose a training device that combines RPA and AI to implement an IA classification model. Figure 4 is a schematic structural diagram of a training device for implementing an IA classification model by combining RPA and AI according to the fourth embodiment of the present disclosure.
如图4所示,该结合RPA和AI实现IA的分类模型的训练装置400,包括:第一获取模块401、第一处理模块402、第二获取模块403以及训练模块404。As shown in Figure 4, the training device 400 that combines RPA and AI to implement an IA classification model includes: a first acquisition module 401, a first processing module 402, a second acquisition module 403, and a training module 404.
第一获取模块401,用于获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别。The first obtaining module 401 is used to obtain the position coordinates of at least one word included in each sample page among the plurality of sample pages, and obtain the category to which each sample page belongs.
第一处理模块402,用于将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量。The first processing module 402 is used to input the position coordinates of each word in each sample page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page.
第二获取模块403,用于基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量。The second acquisition module 403 is used to obtain the encoding vector corresponding to each sample page based on the encoding vector corresponding to each word in each sample page.
训练模块404,用于将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。The training module 404 is used to use the encoding vector corresponding to each sample page and its category as training data to train the initial classification model to obtain a target classification model for document classification.
需要说明的是,本公开实施例的结合RPA和AI实现IA的分类模型的训练装置400,可以执行上述实施例提供的结合RPA和AI实现IA的分类模型的训练方法。其中,结合RPA和AI实现IA的分类模型的训练装置400可以由软件和/或硬件实现,该结合RPA和AI实现IA的分类模型的训练装置可以为电子设备,或者也可以配置在电子设备中,以实现对用于文档分类的分类模型的训练。其中,该电子设备可以包括但不限于终端设备、服务器等,该实施例对电子设备不作具体限定。It should be noted that the training device 400 of the classification model that combines RPA and AI to implement IA in the embodiment of the present disclosure can perform the training method of the classification model that combines RPA and AI to implement IA provided in the above embodiments. Among them, the training device 400 that combines RPA and AI to implement the classification model of IA can be implemented by software and/or hardware. The training device that combines RPA and AI to implement the classification model of IA can be an electronic device, or can also be configured in an electronic device. , to implement the training of classification models for document classification. The electronic device may include but is not limited to a terminal device, a server, etc., and this embodiment does not specifically limit the electronic device.
在本公开的一个实施例中,训练模块404,包括划分单元、训练单元和选取单元。In one embodiment of the present disclosure, the training module 404 includes a dividing unit, a training unit and a selection unit.
划分单元,用于将训练数据划分为训练集和验证集,训练集包括多个第一页面对应的编码向量,验证集包括多个第二页面对应的编码向量,各第一页面以及各第二页面采用所属类别进行标注。A dividing unit, used to divide the training data into a training set and a verification set. The training set includes encoding vectors corresponding to multiple first pages, and the verification set includes encoding vectors corresponding to multiple second pages. Each first page and each second page Pages are labeled with their categories.
训练单元,用于基于各第一页面对应的编码向量以及所属类别,对初始的分类模型进行多轮训练,以得到各轮训练后的候选分类模型。The training unit is used to perform multiple rounds of training on the initial classification model based on the coding vector corresponding to each first page and its category, so as to obtain a candidate classification model after each round of training.
选取单元,用于基于各第二页面对应的编码向量以及所属类别,从各轮训练后的候选分类模型中,选取用于文档分类的目标分类模型。The selection unit is used to select a target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and its category.
在本公开的一个实施例中,选取单元,包括处理子单元和选取子单元。In one embodiment of the present disclosure, the selection unit includes a processing subunit and a selection subunit.
处理子单元,用于对于每轮训练后的候选分类模型,将各第二页面对应的编码向量,输入候选分类模型中,以得到候选分类模型预测的各第二页面属于多个预设类别的置信度,并基于各第二页面属于多个预设类别的置信度,以及各第二页面的所属类别,确定候选分类模型对应的损失值。 The processing subunit is used to input the encoding vector corresponding to each second page into the candidate classification model after each round of training, so as to obtain the prediction that each second page belongs to multiple preset categories predicted by the candidate classification model. Confidence, and based on the confidence that each second page belongs to multiple preset categories and the category to which each second page belongs, determine the loss value corresponding to the candidate classification model.
选取子单元,用于基于各轮训练后的候选分类模型对应的损失值,从各轮训练后的候选分类模型中选取用于文档分类的目标分类模型。Select a subunit for selecting a target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.
在本公开的一个实施例中,第一获取模块401,包括第一获取子单元、第二获取子单元、第三获取子单元、切词单元、第四获取子单元和第五获取子单元。In one embodiment of the present disclosure, the first acquisition module 401 includes a first acquisition sub-unit, a second acquisition sub-unit, a third acquisition sub-unit, a word segmentation unit, a fourth acquisition sub-unit and a fifth acquisition sub-unit.
第一获取子单元,用于获取RPA机器人发送的多个样本文档。The first acquisition subunit is used to acquire multiple sample documents sent by the RPA robot.
第二获取子单元,用于对于每个样本页面,获取样本页面的光学字符识别OCR识别信息。The second acquisition subunit is used to acquire the optical character recognition (OCR) identification information of the sample page for each sample page.
第三获取子单元,用于基于样本页面的OCR识别信息,获取样本页面中至少一个文本片段的文本内容。The third acquisition subunit is used to acquire the text content of at least one text fragment in the sample page based on the OCR recognition information of the sample page.
切词单元,用于对各文本片段的文本内容进行切词,得到各文本片段包括的至少一个词。The word segmentation unit is used to segment the text content of each text segment to obtain at least one word included in each text segment.
第四获取子单元,用于获取各文本片段所占区域的位置坐标。The fourth acquisition subunit is used to acquire the position coordinates of the area occupied by each text fragment.
第五获取子单元,用于基于各文本片段所占区域的位置坐标,以及其中各词在对应文本片段中的位置,获取各词的位置坐标。The fifth acquisition subunit is used to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.
需要说明的是,前述对结合RPA和AI实现IA的分类模型的训练方法实施例的解释说明也适用于该实施例的结合RPA和AI实现IA的分类模型的训练装置,本公开实施例结合RPA和AI实现IA的分类模型的训练装置实施例中未公布的细节,此处不再赘述。It should be noted that the aforementioned explanation of the training method embodiment of the classification model that combines RPA and AI to implement IA is also applicable to the training device of the classification model that combines RPA and AI to implement IA in this embodiment. The embodiment of the present disclosure combines RPA with RPA. The unpublished details of the training device embodiments of the classification model implemented by AI and AI will not be described again here.
综上,本公开实施例的结合RPA和AI实现IA的分类模型的训练装置,获取多个样本页面中每个样本页面包括的至少一个词的位置坐标,以及获取各样本页面的所属类别;将各样本页面中各词的位置坐标,输入预训练文档理解模型,以获取各样本页面中各词对应的编码向量;基于各样本页面中各词对应的编码向量,获取各样本页面对应的编码向量;将各样本页面对应的编码向量以及所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。由此,实现了对用于文档分类的分类模型的训练,提高了分类模型的训练速度,减少了训练过程中所需的数据量。In summary, the training device for the classification model of IA implemented by combining RPA and AI in the embodiment of the present disclosure obtains the position coordinates of at least one word included in each sample page of multiple sample pages, and obtains the category to which each sample page belongs; the position coordinates of each word in each sample page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page; based on the encoding vector corresponding to each word in each sample page, the encoding vector corresponding to each sample page is obtained; the encoding vector corresponding to each sample page and the category to which it belongs are used as training data to train the initial classification model to obtain the target classification model for document classification. In this way, the training of the classification model for document classification is realized, the training speed of the classification model is improved, and the amount of data required in the training process is reduced.
为了实现上述实施例,本公开实施例还提出了一种结合RPA和AI实现IA的文档分类装置。图5是根据本公开第五实施例的结合RPA和AI实现IA的文档分类装置的结构示意图。In order to implement the above embodiments, embodiments of the present disclosure also propose a document classification device that combines RPA and AI to implement IA. Figure 5 is a schematic structural diagram of a document classification device that implements IA by combining RPA and AI according to the fifth embodiment of the present disclosure.
如图5所示,该结合RPA和AI实现IA的文档分类装置500,包括:第三获取模块501、第四获取模块502、第二处理模块503、第五获取模块504、第三处理模块505、第一确定模块506以及第二确定模块507。As shown in Figure 5, the document classification device 500 that combines RPA and AI to implement IA includes: a third acquisition module 501, a fourth acquisition module 502, a second processing module 503, a fifth acquisition module 504, and a third processing module 505 , the first determination module 506 and the second determination module 507.
第三获取模块501,用于获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面。 The third acquisition module 501 is used to acquire the target document sent by the RPA robot, where the target document includes at least one target page.
第四获取模块502,用于获取各目标页面包括的至少一个词的位置坐标。The fourth obtaining module 502 is used to obtain the position coordinates of at least one word included in each target page.
第二处理模块503,用于将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量。The second processing module 503 is used to input the position coordinates of each word in each target page into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page.
第五获取模块504,用于基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量。The fifth acquisition module 504 is used to acquire the encoding vector corresponding to each target page based on the encoding vector corresponding to each word in each target page.
第三处理模块505,用于将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度;其中,目标分类模型通过第一方面实施例所述的方法训练得到。The third processing module 505 is used to input the coding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein, the target classification model uses the method described in the embodiment of the first aspect. method training.
第一确定模块506,用于对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值。The first determination module 506 is configured to determine, for each preset category, the average value of the confidence that each target page belongs to the preset category.
第二确定模块507,用于基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。The second determination module 507 is configured to determine the category to which the target document belongs from among the preset categories based on the average values corresponding to the preset categories.
综上,本公开实施例的结合RPA和AI实现IA的文档分类装置,获取RPA机器人发送的目标文档,目标文档中包括至少一个目标页面,获取各目标页面包括的至少一个词的位置坐标,将各目标页面中各词的位置坐标,输入预训练文档理解模型,以获取各目标页面中各词对应的编码向量,基于各目标页面中各词对应的编码向量,获取各目标页面对应的编码向量,将各目标页面对应的编码向量输入目标分类模型,以得到各目标页面属于多个预设类别的置信度,对于每个预设类别,确定各目标页面属于预设类别的置信度的平均值,基于各预设类别对应的平均值,从各预设类别中确定目标文档的所属类别。由此,实现了结合使用少量的训练数据快速训练得到的目标分类模型,以及预训练文档理解模型,对目标文档进行准确分类。且通过采用目标分类模型,对RPA机器人发送的目标文档进行IA的分类,减少了文档分类所需的人工成本,提高了文档分类的效率。In summary, the document classification device that combines RPA and AI to implement IA in the embodiment of the present disclosure obtains the target document sent by the RPA robot, the target document includes at least one target page, obtains the position coordinates of at least one word included in each target page, and The position coordinates of each word in each target page are input into the pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page. Based on the encoding vector corresponding to each word in each target page, the encoding vector corresponding to each target page is obtained. , input the encoding vector corresponding to each target page into the target classification model to obtain the confidence that each target page belongs to multiple preset categories. For each preset category, determine the average value of the confidence that each target page belongs to the preset category. , based on the average value corresponding to each preset category, determine the category to which the target document belongs from each preset category. As a result, it is possible to accurately classify target documents by combining the target classification model quickly trained with a small amount of training data and the pre-trained document understanding model. And by using the target classification model to classify the target documents sent by the RPA robot, the labor costs required for document classification are reduced and the efficiency of document classification is improved.
为了实现上述实施例,本公开实施例还提出一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如前述任一方法实施例所述的结合RPA和AI实现IA的分类模型的训练方法,或者实现如前述任一方法实施例所述的结合RPA和AI实现IA的文档分类方法。In order to implement the above embodiments, embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, The training method of a classification model that combines RPA and AI to implement IA as described in any of the foregoing method embodiments, or the document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.
为了实现上述实施例,本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如前述任一方法实施例所述的结合RPA和AI实现IA的分类模型的训练方法,或者实现如前述任一方法实施例所述的结合RPA和AI实现IA的文档分类方法。In order to implement the above embodiments, embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the combination of RPA and AI as described in any of the foregoing method embodiments is implemented. Implement the training method of the classification model of IA, or implement the document classification method of combining RPA and AI to implement IA as described in any of the foregoing method embodiments.
为了实现上述实施例,本公开实施例还提出一种计算机程序产品,当所述计算机程序产品中的指令处理器执行时,实现如前述任一方法实施例所述的结合RPA和AI实现IA的 分类模型的训练方法,或者实现如前述任一方法实施例所述的结合RPA和AI实现IA的文档分类方法。In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program product. When the instruction processor in the computer program product is executed, the implementation of IA by combining RPA and AI as described in any of the foregoing method embodiments is realized. The training method of the classification model, or the document classification method that combines RPA and AI to implement IA as described in any of the previous method embodiments.
为了实现上述实施例,本公开实施例还提出一种计算机程序,包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如前述任一方法实施例所述的结合RPA和AI实现IA的分类模型的训练方法,或者执行如前述任一方法实施例所述的结合RPA和AI实现IA的文档分类方法。In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program, including computer program code. When the computer program code is run on a computer, so that the computer executes the method combined with RPA as described in any of the foregoing method embodiments. and AI to implement a classification model training method for IA, or perform a document classification method that combines RPA and AI to implement IA as described in any of the foregoing method embodiments.
图6示出了适于用来实现本公开实施方式的示例性电子设备的框图。图6显示的电子设备10仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。6 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure. The electronic device 10 shown in FIG. 6 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
如图6所示,电子设备10以通用计算设备的形式表现。电子设备10的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括存储器28和处理单元16)的总线18。As shown in Figure 6, electronic device 10 is embodied in the form of a general computing device. The components of electronic device 10 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components, including memory 28 and processing unit 16.
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
在一些实施例中,电子设备10包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备10访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。In some embodiments, electronic device 10 includes a variety of computer system readable media. These media may be any available media that can be accessed by electronic device 10, including volatile and nonvolatile media, removable and non-removable media.
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。电子设备10可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图6未显示,通常称为“硬盘驱动器”)。尽管图6中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。 The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32. Electronic device 10 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive may be provided for reading and writing to a removable non-volatile disk (e.g., a "floppy disk"), and a disk drive for reading and writing a removable non-volatile optical disk (e.g., a compact disk read-only memory). A disc drive that reads and writes from Disc Read Only Memory (hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media). In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.
电子设备10也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该电子设备10交互的设备通信,和/或与使得该电子设备10能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,电子设备10还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图6所示,网络适配器20通过总线18与电子设备10的其它模块通信。应当明白,尽管图6中未示出,可以结合电子设备10使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Electronic device 10 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 10, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 10 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22. Moreover, the electronic device 10 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication. As shown in FIG. 6 , network adapter 20 communicates with other modules of electronic device 10 via bus 18 . It should be understood that, although not shown in Figure 6, other hardware and/or software modules may be used in conjunction with electronic device 10, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems, etc.
处理单元16通过运行存储在存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现前述实施例中提及的方法。The processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.
需要说明的是,前述对方法、装置实施例的解释说明也适用于上述实施例的电子设备、车辆、计算机可读存储介质、计算机程序产品和计算机程序,此处不再赘述。It should be noted that the foregoing explanations of the method and device embodiments also apply to the electronic equipment, vehicles, computer-readable storage media, computer program products and computer programs of the above-mentioned embodiments, and will not be described again here.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序, 包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present disclosure includes additional implementations, which may not be in the order shown or discussed, This includes performing the functions in a substantially simultaneous manner or in a reverse order depending on the functions involved, which should be understood by those skilled in the art to which embodiments of the present disclosure belong.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.
应当理解,本公开实施例的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the embodiments of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。 The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。 All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims (14)

  1. 一种结合机器人流程自动化RPA和人工智能AI实现智能自动化IA的分类模型的训练方法,其中,所述方法包括:A training method for a classification model that combines robotic process automation RPA and artificial intelligence AI to realize intelligent automation IA, wherein the method includes:
    获取多个样本页面中每个所述样本页面包括的至少一个词的位置坐标,以及获取各所述样本页面的所属类别;Obtain the position coordinates of at least one word included in each of the plurality of sample pages, and obtain the category to which each of the sample pages belongs;
    将各所述样本页面中各所述词的所述位置坐标,输入预训练文档理解模型,以获取各所述样本页面中各所述词对应的编码向量;Enter the position coordinates of each word in each sample page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page;
    基于各所述样本页面中各所述词对应的编码向量,获取各所述样本页面对应的编码向量;和Based on the encoding vector corresponding to each of the words in each of the sample pages, obtain the encoding vector corresponding to each of the sample pages; and
    将各所述样本页面对应的所述编码向量以及所述所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。The encoding vector and the category corresponding to each sample page are used as training data to train the initial classification model to obtain a target classification model for document classification.
  2. 根据权利要求1所述的方法,其中,所述将各所述样本页面对应的所述编码向量以及所述所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型,包括:The method according to claim 1, wherein the coding vector corresponding to each sample page and the category to which it belongs are used as training data to train an initial classification model to obtain the document classification model. Target classification models, including:
    将所述训练数据划分为训练集和验证集,所述训练集包括多个第一页面对应的编码向量,所述验证集包括多个第二页面对应的编码向量,各所述第一页面以及各所述第二页面采用所述所属类别进行标注;The training data is divided into a training set and a verification set. The training set includes encoding vectors corresponding to a plurality of first pages. The verification set includes encoding vectors corresponding to a plurality of second pages. Each of the first pages and Each second page is marked with the category to which it belongs;
    基于各所述第一页面对应的所述编码向量以及所述所属类别,对初始的所述分类模型进行多轮训练,以得到各轮训练后的候选分类模型;和Based on the coding vector corresponding to each first page and the category to which it belongs, perform multiple rounds of training on the initial classification model to obtain a candidate classification model after each round of training; and
    基于各所述第二页面对应的所述编码向量以及所述所属类别,从各轮训练后的所述候选分类模型中,选取所述用于文档分类的目标分类模型。Based on the encoding vector corresponding to each second page and the category to which it belongs, the target classification model for document classification is selected from the candidate classification models after each round of training.
  3. 根据权利要求2所述的方法,其中,所述基于各所述第二页面对应的编码向量以及所述所属类别,从各轮训练后的所述候选分类模型中,选取所述用于文档分类的目标分类模型,包括:The method of claim 2, wherein, based on the coding vector corresponding to each second page and the category to which it belongs, the candidate classification model for document classification is selected from the candidate classification models after each round of training. Target classification models include:
    对于每轮训练后的所述候选分类模型,将各所述第二页面对应的所述编码向量,输入所述候选分类模型中,以得到所述候选分类模型预测的各所述第二页面属于多个预设类别的置信度,并基于各所述第二页面属于多个预设类别的所述置信度,以及各所述第二页面的所述所属类别,确定所述候选分类模型对应的损失值;和 For the candidate classification model after each round of training, the encoding vector corresponding to each second page is input into the candidate classification model to obtain the predicted category of each second page predicted by the candidate classification model. Confidences of multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the candidate classification model corresponding to loss value; and
    基于各轮训练后的所述候选分类模型对应的所述损失值,从各轮训练后的所述候选分类模型中选取所述用于文档分类的目标分类模型。Based on the loss value corresponding to the candidate classification model after each round of training, the target classification model for document classification is selected from the candidate classification models after each round of training.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述获取多个样本页面中每个所述样本页面包括的至少一个词的位置坐标,包括:The method according to any one of claims 1 to 3, wherein said obtaining the position coordinates of at least one word included in each of the plurality of sample pages includes:
    获取RPA机器人发送的所述多个样本文档;Obtain the multiple sample documents sent by the RPA robot;
    对于每个所述样本页面,获取所述样本页面的光学字符识别OCR识别信息;For each of the sample pages, obtaining optical character recognition (OCR) information of the sample page;
    基于所述样本页面的所述OCR识别信息,获取所述样本页面中至少一个文本片段的文本内容;Based on the OCR identification information of the sample page, obtain the text content of at least one text fragment in the sample page;
    对各所述文本片段的所述文本内容进行切词,得到各所述文本片段包括的至少一个词;Perform word segmentation on the text content of each text segment to obtain at least one word included in each text segment;
    获取各所述文本片段所占区域的位置坐标;和Obtain the position coordinates of the area occupied by each text fragment; and
    基于各所述文本片段所占区域的所述位置坐标,以及其中各所述词在对应文本片段中的位置,获取各所述词的位置坐标。Based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment, the position coordinates of each word are obtained.
  5. 一种结合RPA和AI实现IA的文档分类方法,其中,所述方法包括:A document classification method that combines RPA and AI to implement IA, wherein the method includes:
    获取RPA机器人发送的目标文档,所述目标文档中包括至少一个目标页面;Obtain the target document sent by the RPA robot, and the target document includes at least one target page;
    获取各所述目标页面包括的至少一个词的位置坐标;Obtain the position coordinates of at least one word included in each target page;
    将各所述目标页面中各所述词的所述位置坐标,输入预训练文档理解模型,以获取各所述目标页面中各所述词对应的编码向量;Enter the position coordinates of each word in each target page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page;
    基于各所述目标页面中各所述词对应的所述编码向量,获取各所述目标页面对应的编码向量;Based on the coding vector corresponding to each word in each target page, obtain the coding vector corresponding to each target page;
    将各所述目标页面对应的所述编码向量输入目标分类模型,以得到各所述目标页面属于多个预设类别的置信度;其中,所述目标分类模型通过权利要求1至4中任一项所述的方法训练得到;The encoding vector corresponding to each target page is input into a target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein the target classification model passes any one of claims 1 to 4 Obtained by training according to the method described in the item;
    对于每个所述预设类别,确定各所述目标页面属于所述预设类别的所述置信度的平均值;和For each of the preset categories, determine the average value of the confidence that each of the target pages belongs to the preset category; and
    基于各所述预设类别对应的所述平均值,从各所述预设类别中确定所述目标文档的所属类别。Based on the average value corresponding to each of the preset categories, the category to which the target document belongs is determined from each of the preset categories.
  6. 一种结合RPA和AI实现IA的分类模型的训练装置,其中,所述装置包括:A training device that combines RPA and AI to implement an IA classification model, wherein the device includes:
    第一获取模块,用于获取多个样本页面中每个所述样本页面包括的至少一个词的位置坐标,以及获取各所述样本页面的所属类别; A first acquisition module, configured to acquire the position coordinates of at least one word included in each sample page in a plurality of sample pages, and acquire the category to which each sample page belongs;
    第一处理模块,用于将各所述样本页面中各所述词的所述位置坐标,输入预训练文档理解模型,以获取各所述样本页面中各所述词对应的编码向量;The first processing module is configured to input the position coordinates of each word in each sample page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each sample page;
    第二获取模块,用于基于各所述样本页面中各所述词对应的所述编码向量,获取各所述样本页面对应的编码向量;和The second acquisition module is used to obtain the encoding vector corresponding to each of the sample pages based on the encoding vector corresponding to each of the words in each of the sample pages; and
    训练模块,用于将各所述样本页面对应的所述编码向量以及所述所属类别,作为训练数据,对初始的分类模型进行训练,以得到用于文档分类的目标分类模型。A training module is used to use the coding vector and the category corresponding to each sample page as training data to train an initial classification model to obtain a target classification model for document classification.
  7. 根据权利要求6所述的装置,其中,所述训练模块,包括:The device according to claim 6, wherein the training module includes:
    划分单元,用于将所述训练数据划分为训练集和验证集,所述训练集包括多个第一页面对应的编码向量,所述验证集包括多个第二页面对应的编码向量,各所述第一页面以及各所述第二页面采用所述所属类别进行标注;A dividing unit, configured to divide the training data into a training set and a verification set, where the training set includes encoding vectors corresponding to a plurality of first pages, and the verification set includes encoding vectors corresponding to a plurality of second pages, each of which The first page and each of the second pages are marked with the category to which they belong;
    训练单元,用于基于各所述第一页面对应的所述编码向量以及所述所属类别,对初始的所述分类模型进行多轮训练,以得到各轮训练后的候选分类模型;和A training unit configured to perform multiple rounds of training on the initial classification model based on the encoding vector corresponding to each first page and the category to which it belongs, to obtain a candidate classification model after each round of training; and
    选取单元,用于基于各所述第二页面对应的所述编码向量以及所述所属类别,从各轮训练后的所述候选分类模型中,选取所述用于文档分类的目标分类模型。A selection unit configured to select the target classification model for document classification from the candidate classification models after each round of training based on the encoding vector corresponding to each second page and the category to which it belongs.
  8. 根据权利要求7所述的装置,其中,所述选取单元,包括:The device according to claim 7, wherein the selection unit includes:
    处理子单元,用于对于每轮训练后的所述候选分类模型,将各所述第二页面对应的所述编码向量,输入所述候选分类模型中,以得到所述候选分类模型预测的各所述第二页面属于多个预设类别的置信度,并基于各所述第二页面属于多个预设类别的所述置信度,以及各所述第二页面的所述所属类别,确定所述候选分类模型对应的损失值;和A processing subunit configured to input the encoding vector corresponding to each second page into the candidate classification model for each round of training to obtain each prediction of the candidate classification model. The confidence that the second page belongs to multiple preset categories, and based on the confidence that each second page belongs to multiple preset categories, and the category to which each second page belongs, determine the the loss value corresponding to the candidate classification model; and
    选取子单元,用于基于各轮训练后的所述候选分类模型对应的所述损失值,从各轮训练后的所述候选分类模型中选取所述用于文档分类的目标分类模型。A selection subunit is configured to select the target classification model for document classification from the candidate classification models after each round of training based on the loss value corresponding to the candidate classification model after each round of training.
  9. 根据权利要求6至8中任一项所述的装置,其中,所述第一获取模块,包括:The device according to any one of claims 6 to 8, wherein the first acquisition module includes:
    第一获取子单元,用于获取RPA机器人发送的所述多个样本文档;The first acquisition subunit is used to acquire the multiple sample documents sent by the RPA robot;
    第二获取子单元,用于对于每个所述样本页面,获取所述样本页面的光学字符识别OCR识别信息;The second acquisition subunit is configured to acquire, for each sample page, the optical character recognition (OCR) identification information of the sample page;
    第三获取子单元,用于基于所述样本页面的所述OCR识别信息,获取所述样本页面中至少一个文本片段的文本内容;The third acquisition subunit is configured to acquire the text content of at least one text fragment in the sample page based on the OCR identification information of the sample page;
    切词单元,用于对各所述文本片段的所述文本内容进行切词,得到各所述文本片段包括的至少一个词; A word segmentation unit, used to segment the text content of each text segment to obtain at least one word included in each text segment;
    第四获取子单元,用于获取各所述文本片段所占区域的位置坐标;和The fourth acquisition subunit is used to acquire the position coordinates of the area occupied by each of the text fragments; and
    第五获取子单元,用于基于各所述文本片段所占区域的所述位置坐标,以及其中各所述词在对应文本片段中的位置,获取各所述词的位置坐标。The fifth acquisition subunit is used to obtain the position coordinates of each word based on the position coordinates of the area occupied by each text segment and the position of each word in the corresponding text segment.
  10. 一种结合RPA和AI实现IA的文档分类装置,其中,所述装置包括:A document classification device that combines RPA and AI to implement IA, wherein the device includes:
    第三获取模块,用于获取RPA机器人发送的目标文档,所述目标文档中包括至少一个目标页面;The third acquisition module is used to acquire the target document sent by the RPA robot, where the target document includes at least one target page;
    第四获取模块,用于获取各所述目标页面包括的至少一个词的位置坐标;The fourth acquisition module is used to acquire the position coordinates of at least one word included in each of the target pages;
    第二处理模块,用于将各所述目标页面中各所述词的所述位置坐标,输入预训练文档理解模型,以获取各所述目标页面中各所述词对应的编码向量;The second processing module is used to input the position coordinates of each word in each target page into a pre-trained document understanding model to obtain the encoding vector corresponding to each word in each target page;
    第五获取模块,用于基于各所述目标页面中各所述词对应的所述编码向量,获取各所述目标页面对应的编码向量;The fifth acquisition module is used to obtain the encoding vector corresponding to each of the target pages based on the encoding vector corresponding to each of the words in each of the target pages;
    第三处理模块,用于将各所述目标页面对应的所述编码向量输入目标分类模型,以得到各所述目标页面属于多个预设类别的置信度;其中,所述目标分类模型通过权利要求1至4中任一项所述的方法训练得到;The third processing module is used to input the encoding vector corresponding to each target page into a target classification model to obtain the confidence that each target page belongs to multiple preset categories; wherein the target classification model passes the right Obtained by training according to the method described in any one of requirements 1 to 4;
    第一确定模块,用于对于每个所述预设类别,确定各所述目标页面属于所述预设类别的置信度的平均值;和A first determination module configured to, for each of the preset categories, determine the average value of the confidence that each of the target pages belongs to the preset category; and
    第二确定模块,用于基于各所述预设类别对应的所述平均值,从各所述预设类别中确定所述目标文档的所属类别。The second determination module is configured to determine the category to which the target document belongs from each of the preset categories based on the average value corresponding to each of the preset categories.
  11. 一种电子设备,其中,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1至4中任一项所述的方法,或者实现如权利要求5所述的方法。An electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, any one of claims 1 to 4 is implemented. The method described in claim 5, or the method described in claim 5.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现如权利要求1至4中任一项所述的方法,或者实现如权利要求5所述的方法。A computer-readable storage medium with a computer program stored thereon, wherein when the computer program is executed by a processor, the method as claimed in any one of claims 1 to 4 is implemented, or the method as claimed in claim 5 is implemented method.
  13. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至4中任一项所述的方法,或者实现如权利要求5所述的方法。A computer program product includes a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 4 is implemented, or the method according to claim 5 is implemented.
  14. 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1至4中任一项所述的方 法,或者执行如权利要求5所述的方法。 A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, the computer performs the method as claimed in any one of claims 1 to 4. method, or perform the method as claimed in claim 5.
PCT/CN2023/116770 2022-09-16 2023-09-04 Training method and apparatus for implementing ia classification model using rpa and ai WO2024055864A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211125956.5A CN115578739A (en) 2022-09-16 2022-09-16 Training method and device for realizing IA classification model by combining RPA and AI
CN202211125956.5 2022-09-16

Publications (1)

Publication Number Publication Date
WO2024055864A1 true WO2024055864A1 (en) 2024-03-21

Family

ID=84581848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116770 WO2024055864A1 (en) 2022-09-16 2023-09-04 Training method and apparatus for implementing ia classification model using rpa and ai

Country Status (2)

Country Link
CN (1) CN115578739A (en)
WO (1) WO2024055864A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578739A (en) * 2022-09-16 2023-01-06 上海来也伯特网络科技有限公司 Training method and device for realizing IA classification model by combining RPA and AI

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699923A (en) * 2020-12-21 2021-04-23 深圳壹账通智能科技有限公司 Document classification prediction method and device, computer equipment and storage medium
CN114757247A (en) * 2020-12-28 2022-07-15 腾讯科技(深圳)有限公司 Training method of classification prediction model, classification prediction method, device and equipment
CN114817538A (en) * 2022-04-26 2022-07-29 马上消费金融股份有限公司 Training method of text classification model, text classification method and related equipment
CN115578739A (en) * 2022-09-16 2023-01-06 上海来也伯特网络科技有限公司 Training method and device for realizing IA classification model by combining RPA and AI

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699923A (en) * 2020-12-21 2021-04-23 深圳壹账通智能科技有限公司 Document classification prediction method and device, computer equipment and storage medium
CN114757247A (en) * 2020-12-28 2022-07-15 腾讯科技(深圳)有限公司 Training method of classification prediction model, classification prediction method, device and equipment
CN114817538A (en) * 2022-04-26 2022-07-29 马上消费金融股份有限公司 Training method of text classification model, text classification method and related equipment
CN115578739A (en) * 2022-09-16 2023-01-06 上海来也伯特网络科技有限公司 Training method and device for realizing IA classification model by combining RPA and AI

Also Published As

Publication number Publication date
CN115578739A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN109902622B (en) Character detection and identification method for boarding check information verification
CN111488826B (en) Text recognition method and device, electronic equipment and storage medium
US11562588B2 (en) Enhanced supervised form understanding
CN107767870A (en) Adding method, device and the computer equipment of punctuation mark
WO2021208727A1 (en) Text error detection method and apparatus based on artificial intelligence, and computer device
CN110705952A (en) Contract auditing method and device
WO2024055864A1 (en) Training method and apparatus for implementing ia classification model using rpa and ai
CN110929802A (en) Information entropy-based subdivision identification model training and image identification method and device
CN110598686A (en) Invoice identification method, system, electronic equipment and medium
CN115658955B (en) Cross-media retrieval and model training method, device, equipment and menu retrieval system
WO2023173560A1 (en) Rpa and ai based text error correction method, training method and related device thereof
CN114330588A (en) Picture classification method, picture classification model training method and related device
CN117058421A (en) Multi-head model-based image detection key point method, system, platform and medium
CN114707017B (en) Visual question-answering method, visual question-answering device, electronic equipment and storage medium
US11687700B1 (en) Generating a structure of a PDF-document
CN115618043A (en) Text operation graph mutual inspection method and model training method, device, equipment and medium
CN114120074B (en) Training method and training device for image recognition model based on semantic enhancement
US11461411B2 (en) System and method for parsing visual information to extract data elements from randomly formatted digital documents
CN114492386A (en) Combined detection method for drug name and adverse drug reaction in web text
CN115238673A (en) Method and device for generating file, electronic device and storage medium
CN111798376A (en) Image recognition method and device, electronic equipment and storage medium
WO2023159771A1 (en) Rpa and ai-based invoice processing method and apparatus, device, and medium
RU2777354C2 (en) Image recognition system: beorg smart vision
CN115881265B (en) Intelligent medical record quality control method, system and equipment for electronic medical record and storage medium
CN113762303B (en) Image classification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864596

Country of ref document: EP

Kind code of ref document: A1