CN117648093A - RPA flow automatic generation method based on large model and self-customized demand template - Google Patents

RPA flow automatic generation method based on large model and self-customized demand template Download PDF

Info

Publication number
CN117648093A
CN117648093A CN202311830148.3A CN202311830148A CN117648093A CN 117648093 A CN117648093 A CN 117648093A CN 202311830148 A CN202311830148 A CN 202311830148A CN 117648093 A CN117648093 A CN 117648093A
Authority
CN
China
Prior art keywords
rpa
large model
code
model
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311830148.3A
Other languages
Chinese (zh)
Inventor
马博文
糜俊
奚阳
陈郑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clp Hongxin Information Technology Co ltd
Original Assignee
Clp Hongxin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clp Hongxin Information Technology Co ltd filed Critical Clp Hongxin Information Technology Co ltd
Priority to CN202311830148.3A priority Critical patent/CN117648093A/en
Publication of CN117648093A publication Critical patent/CN117648093A/en
Pending legal-status Critical Current

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses an RPA flow automatic generation method based on a large model and a self-customized demand template, which comprises the following steps: training a document analysis large model and a code generation large model based on a man-machine collaborative labeling mode; then, analyzing the large model by using the trained document, and generating a self-customized demand template; generating a large model based on the self-customized demand template and the trained codes, and generating RPA codes; finally, based on the generated RPA code, the generated code is compiled by using an RPA designer, so that the automatic generation of the component group is realized. By effectively utilizing the large model, the problems of inaccurate generation of the component operation flow chart, narrow application range, difficult generation of high-matching RPA component execution codes and the like in the existing RPA component automatic generation process are solved, and the artificial workload is effectively reduced.

Description

RPA flow automatic generation method based on large model and self-customized demand template
Technical Field
The invention relates to the technical field of data processing, in particular to an RPA flow automatic generation method based on a large model and a self-customized demand template.
Background
With the advent of the digital age, the RPA (RPA is Robotic Process Automation, and the robot process is automated) technology is widely applied to various industries, so that the business processing speed of the enterprise is greatly improved. However, RPAs are limited in that they require manual redefined automation rules and cannot handle complex or unstructured data.
In recent years, the artificial intelligence production content AIGC (AIGC: artificial Intelligence Generated Content, artificial intelligence production content) subverts the knowledge of the industry about AI (Artificial Intelligence ), and the AIGC can well construct a demand flow chart of the demand content in the uploaded document and automatically generate relevant instruction codes according to the flow chart. Therefore, the large model and the RPA robot are fused into the main flow research direction of the current scientific research institutions and enterprises, and the RPA robot can be effectively helped to process more complex tasks.
At present, solutions for solving similar problems using ai+rpa robots are:
scheme 1: the required operation flow chart is generated by performing word embedding on the required document converted into the flow block dictionary by using a BERT-base model (BERT-base: bidirectional Encoder Representations from Transformers-base is based on a bidirectional coding feature representation model of a transducer), performing flow relation processing and logical relation merging on the flow block dictionary block after word embedding by using an AllenNLP, and finally performing code conversion on the generated flow chart by using a third party tool.
Scheme 2: and obtaining the transition probability of each functional component to other components by counting the use relation of various functional components in the service scene, and recommending each step of step component of the process for the RPA based on the transition probability, thereby realizing the process automation.
However, the above scheme has the following problems:
problem 1: firstly, the scheme 1 mainly carries out dictionary block conversion on the content of a demand document based on a BERT model, and the dictionary block generation method which only depends on the semantic characteristics of the model easily causes the deletion of core flow steps in the dictionary block and the addition of irrelevant content, thereby causing the error of a subsequent component operation flow chart, and secondly, the execution code of a corresponding flow chart generated by using third-party software cannot be well adapted to an RPA robot, and the execution abnormality easily occurs.
Problem 2: in the scheme 2, the next component is recommended for each step of RPA flow by constructing the component transition probability model, so that flow automation is realized. The method is theoretically feasible, but can only be applied to some simple RPA scenes with single steps, if the method is applied to complex RPA projects, a large amount of flow data and relation data are needed to be provided for a probability model for training, and the time cost and the labor cost are high. Secondly, for the list component common to RPA, problems will occur by simply relying on probability recommendation, such as clicking on an employee type list button containing three choices of developer, project manager, and backoffice staff, which are equally probable to the probability model, and cannot be accurately recommended. Finally, scheme 2 simply recommends components, and does not automatically generate RPA codes to locate components, so that recommended components still need to be manually located with web page elements. Therefore, the invention has a narrow application range in practical application and poor practicability.
Therefore, there is a need for an RPA process automation generation method based on a large model and a custom demand template to solve the above-mentioned problems.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an RPA flow automatic generation method based on a large model and a self-customized demand template, so as to solve the problems of inaccurate generation of component operation flow diagrams, narrow application range and difficult generation of high-matching RPA component execution codes in the existing RPA component automatic generation process.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an RPA flow automatic generation method based on a large model and a self-customized demand template is characterized by comprising the following steps: training a document analysis large model and a code generation large model based on a man-machine collaborative labeling mode; then, analyzing the large model by using the trained document, and generating a self-customized demand template; generating a large model based on the self-customized demand template and the trained codes, and generating RPA codes; finally, based on the generated RPA code, the generated code is compiled by using an RPA designer, so that the automatic generation of the component group is realized.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, the training of the document parsing large model comprises the following steps:
step 1.1, collecting a demand document as an original document, and simultaneously manually adding a corresponding webpage link to a part, which relates to specific flow operation, in the collected demand document to form a demand document format;
step 1.2, randomly selecting 40% of the required documents to perform manual content extraction and label marking, and generating a manual marking data set D N =(a N ,q N ) The method comprises the steps of carrying out a first treatment on the surface of the And dividing the training set D according to a 9:1 dividing mode n1 And test set D n2 Where N is the total number of data sets, q N For D N A) document content set of (a) N For D N N1, N2 are the number of training and testing sets, respectively, and n1+n2=n;
step 1.3, D n1 Sending the file into a large file analysis model for primary fine adjustment;
step 1.4, carrying out data enhancement on the rest 60% of required documents by using a document splicing and random masking mode, extracting the content of the required documents after data enhancement by using an OCR algorithm, and inputting the extracted text into a fine-tuned document analysis large model for analysis; generating a machine annotation data set by taking document content as a question and taking answers of a document analysis large model as a result Wherein V is the total number of data sets, ">Is->Document content set->Is->Answering a result set;
step 1.5, D n1 Andthe new data sets are formed by combination and are input into a document analysis large model for fine tuning again; using test set D n2 Testing the re-fine-tuned document analysis large model, and if the model answer accuracy reaches more than 80%, completing model training to obtain a final document analysis large model; otherwise, the current training model is used as the latest fine tuning model to return to the step 1.4.
Further, the training of the code generation large model comprises the following steps:
step 2.1 based on dataset D N Medium answer result set a N ={a 1 ,a 2 ,…,a i ,…,a N Manually extracting corresponding process source codes by using a webpage source code extractor to generate a corresponding source code set o N ={o 1 ,o 2 ,…,o i ,…,o N Generating a corresponding RPA code set c using an RPA designer N ={c 1 ,c 2 ,…,c i ,…,c N -a }; wherein i is an index value ranging from [1, N]Between, a i 、o i 、c i Respectively representing an ith answer result, a webpage source code and an RPA code;
step 2.2, use a N And o N Is integrated content of (1) as a question setc N Answer result set as big model +.>Constructing code to generate large model artificial annotation data setB N =(q c ,a c ) N The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the training set B is divided according to a 9:1 division mode n1 And test set B n2
Step 2.3, B n1 Sending the code into a code generation large model for primary fine tuning;
step 2.4, based on a N 、o N Andcode generation large model machine annotation set +.>Is constructed according to the following steps;
step 2.5, B n1 Andthe new data sets are formed by combination and are input into a code generation large model for fine tuning again; using test set B n2 Testing the re-fine-tuned large model, and if the answer accuracy of the large model reaches more than 80%, completing model training to obtain a final code generation large model; otherwise, the current training model is used as the latest fine tuning model to return to the step 2.4.
Further, the a-based N 、o N Andcode generation large model machine annotation set +.>The construction of the (C) is as follows:
step 2.4.1, establishing an empty machine labeling source code set
Step 2.4.2, willEach answer result in (a) is equal to a N All answer results in the process are subjected to process similarity calculation to obtain a process similarity result matrix +.>The specific calculation results are as follows:
the SIM (-) is a flow similarity calculation function, and a Word2Vec model is used for carrying out flow similarity calculation by combining a cosine similarity algorithm;is->Is the transpose of index value, j is in the range of [1, V]Between (I)>Is- >The j-th answer result;
step 2.4.3, position k E [1, N ] with maximum similarity value in each row of the positioning matrix T]And based on this position at o N Searching for the corresponding source code o k Then get o for each row k Deposit collectionIn (a) and (b);
step 2.4.4And->Is->Fine tuning the code to generate answer results for large models for question sets +.>For result set, machine labeling dataset +.>Is a construction of (3).
Further, the trained document is utilized to analyze the large model, and a self-customized demand template is generated; and generating a large model based on the self-customized demand template and the trained codes, and generating RPA codes, wherein the specific steps are as follows:
step 3.1, taking the whole content in the self-customized demand template as a problem, and inquiring a trained code generation large model;
step 3.2, performing format verification on the RPA code of the code generation large model answer, if the verification requirement is met, entering step 3.3, otherwise, returning to step 3.1; wherein, the format check rule is as follows:
step 3.2.1, judging whether the currently generated RPA code meets the grammar requirement of an RPA designer;
step 3.2.2, judging whether the structure of the currently generated RPA code is complete;
And 3.3, sending the codes meeting the verification requirements in the step 3.2 into an RPA designer as final RPA flow codes.
Further, the specific steps of the self-customized demand template are as follows:
step 3.1.1, acquiring a required document uploaded by a user, and extracting document contents by using an OCR algorithm;
step 3.1.2, sending the extracted document content into a trained document analysis large model for operation flow analysis, and outputting a document analysis large model analysis result;
and 3.1.3, performing format verification on the result output in the step 3.1.2, wherein the verification rule is as follows: traversing the current output result from top to bottom and from left to right, and judging whether each operation flow step has a webpage link corresponding to the webpage link; if the verification is passed, the RPA operation flow chart which is the current result is taken as a step 3.1.4, otherwise, the user is informed of the content of verification failure, the requirement on the modification of the required document is required, and the step 3.1.1 is returned;
step 3.1.4, extracting corresponding webpage source codes by using a webpage source code analyzer based on webpage links and operation step descriptions in the RPA operation flow chart;
step 3.1.5, integrating the source code information acquired in the step 3.1.4 with the RPA operation flow chart output in the step 3.1.3 according to the self-customized demand template format to generate a final self-customized demand template; the generated self-customized demand template format is: the flow operation is [ content ], the flow involves web page link is [ url ], each flow reference source code is [ code ], RPA source code generation is carried out, the generated source code meets the compiling requirement of an RPA designer, and if reply cannot be generated, the source code cannot be generated; the content, url and code in the customized demand template format correspond to the operation steps of the flow chart in the step 3.1.3, the corresponding web page links and the step 3.1.4 source code information respectively.
Further, based on the webpage links and the specific operation step descriptions in the RPA operation flow chart, the corresponding webpage source code extraction is performed by using a webpage source code analyzer, and the specific steps are as follows:
step 3.1.4.1, sequentially analyzing the web page links corresponding to each operation flow in the RPA operation flow chart by using a web page source code analyzer, and returning to the analyzed DOM tree;
3.1.4.2, analyzing the DOM tree layer by layer based on the depth-first rule, and judging whether the web page source code contained in the hierarchical label is the source code required by the current operation flow by using a source code discrimination function, wherein the specific judgment function is defined as follows:
wherein f s (. Cndot.) is the source code judgment function, S l 、S t Respectively, hierarchical label text and current operation flow description text, T (S l ,S t )、V(S l ,S t ) Based on S respectively l And S is t Is a structural similarity judging function and a vector cosine similarity judging function, D (S t ) A geometric distance judging function for the current level label to the top end of the DOM tree; w (w) k Indicating the simultaneous occurrence of S l And S is t In the method, the number of the same words in the Word is theta (·) to represent text vectorization, word2Vec model is used to complete Word vector conversion, alpha, beta, gamma and kappa are source code judgment function adjusting factors, and eta is a set threshold;
Step 3.1.4.3, if the whole DOM tree does not have the source code block meeting the source code judging function of step 3.1.4.2, indicating that the large model analyzes errors on the required document, and returning to step 3.1.2; otherwise, selecting the source code blocks contained in the hierarchical label which meets the judging conditions and has the highest score as the source codes required by the current operation flow.
Further, the generated code is compiled by using an RPA designer based on the generated RPA code, so that the automatic generation of the component group is realized, and the specific steps are as follows:
step 4.1, automatically loading a project dependent package according to the content of a < Assemblely reference > tag in the generated code, so as to ensure that the subsequent component codes can be operated orderly;
step 4.2, after the dependent package is loaded successfully, the RPA designer performs visual page layout design according to the WorkflowViewState tag attribute;
step 4.3, executing all contents contained in the < sequence > tag sequentially from left to right from top to bottom, and selecting components according to the internal tag elements; performing component configuration according to the internal tag attribute;
and 4.4, after the execution is finished, carrying out component group layout on the visual page according to the execution result, and simultaneously storing the layout result in a preset local position in a format of oxaml and informing a user to check in time.
Further, a computer-readable storage medium storing a computer program, characterized in that: the computer program causes a computer to perform an RPA process automation generation method based on a large model and a self-customized demand template as described above.
Further, an electronic device is characterized by comprising: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the RPA flow automatic generation method based on the large model and the self-customized demand template when executing the computer program.
The beneficial effects of the invention are as follows:
according to the invention, the document analysis large model is utilized to analyze and integrate specific operation steps in the document, and meanwhile, the content verification rule is used to verify the content points in the required document, so that the generated RPA required operation flow chart is more accurate, the robustness is better, and the application range is wider.
The self-customized demand-based template used by the invention reduces the difficulty of directly outputting the program language conforming to the grammar constraint by providing the effective natural language algorithm description and the corresponding webpage source code content, and simultaneously generates a large model and a standard verification rule by using the self-trained code, thereby effectively improving the reliability and stability of the output code.
Drawings
FIG. 1 is a schematic flow diagram of an RPA process automation generation method based on a large model and a self-customized demand template;
FIG. 2 is a schematic diagram of an RPA code generation flow of an RPA flow automatic generation method based on a large model and a self-customized demand template;
FIG. 3 is a graph of a labeling format of a large model dataset for document analysis based on an RPA flow automatic generation method of a large model and a self-customized demand template;
FIG. 4 is a diagram of a large model dataset annotation format generated by the code of the RPA process automation generation method based on a large model and a self-customized demand template;
FIG. 5 is a graph of the result of the customized demand template of the RPA process automation generation method based on the large model and the customized demand template.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings of the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
As shown in fig. 1, an RPA process automation generation method based on a large model and a self-customized demand template according to an embodiment of the present invention includes the following steps: training a document analysis large model and a code generation large model based on a man-machine collaborative labeling mode; then, analyzing the large model by using the trained document, and generating a self-customized demand template; generating a large model based on the self-customized demand template and the trained codes, and generating RPA flow codes; finally, based on the generated RPA flow code, the generated code is compiled by using an RPA designer, so that the automatic generation of the component group is realized.
In the above embodiment, a specific labeling structure for training a document analysis large model and a code generation large model based on a man-machine collaborative labeling mode is shown in fig. 3 and fig. 4, where the information acquisition and training of the document analysis large model includes the following steps:
step 1.1, collecting a large number of required documents as original documents, wherein the document formats include, but are not limited to, pdf, word, excel, md and the like, and simultaneously, manually adding corresponding webpage links to parts, related to specific flow operations, of the collected required documents to form a required document format required by the invention;
Step 1.2, randomly selecting 40% of the required documents to perform manual content extraction and label marking, and generating a manual marking data set D N =(a N ,q N ) The specific labeling format is shown in figure 3; and according to 91 division mode division training set D n1 And test set D n2 Where N is the total number of data sets, q N For D N A) document content set of (a) N For D N N1, N2 are the number of training and testing sets, respectively, and n1+n2=n;
step 1.3, D n1 Sending the file into a large file analysis model for primary fine adjustment;
step 1.4, carrying out data enhancement on the rest 60% of required documents by using document splicing, random masking and other modes, extracting the content of the required documents after data enhancement by utilizing an OCR algorithm, and inputting the extracted text into a fine-tuned document analysis large model for analysis; generating a machine annotation data set by taking document content as a question and taking answers of a document analysis large model as a resultWherein V is the total number of data sets, ">Is->Document content set->Is->Answering a result set;
step 1.5, D n1 Andthe new data sets are formed by combination and are input into a document analysis large model for fine tuning again; using test set D n2 Testing the re-fine-tuned document analysis large model, and if the model answer accuracy reaches more than 80%, completing model training to obtain a final document analysis large model; otherwise, the current training model is used as the latest fine tuning model The pattern returns to step 1.4.
In the above embodiment, the information acquisition and training of the code generation large model includes the following steps:
step 2.1 based on dataset D N Medium answer result set a N ={a 1 ,a 2 ,…,a i ,…,a N Manually extracting corresponding process source codes by using a webpage source code extractor to generate a corresponding source code set o N ={o 1 ,o 2 ,…,o i ,…,o N Generating a corresponding RPA code set c using an RPA designer N ={c 1 ,c 2 ,…,c i ,…,c N -a }; wherein i is an index value ranging from [1, N]Between, a i 、o i 、c i Respectively representing an ith answer result, a webpage source code and an RPA code;
step 2.2, use a N And o N Is integrated content of (1) as a question setc N Answer result set as big model +.>Creating large model artificial annotation data set B by constituting codes N =(q c ,a c ) N The labeling format is shown in figure 4; similarly, the training set B is divided according to a 9:1 division mode n1 And test set B n2
Step 2.3, B n1 Sending the code into a code generation large model for primary fine tuning;
step 2.4, based on a N 、o N Andcode generation large model machine annotation set +.>Is constructed according to the following steps;
step 2.5, B n1 Andthe new data sets are formed by combination and are input into a code generation large model for fine tuning again; using test set B n2 Testing the re-fine-tuned large model, and if the answer accuracy of the large model reaches more than 80%, completing model training to obtain a final code generation large model; otherwise, the current training model is used as the latest fine tuning model to return to the step 2.4.
Wherein the a-based N 、o N Andcode generation large model machine annotation set +.>The construction of the (C) is as follows:
step 2.4.1, establishing an empty machine labeling source code set
Step 2.4.2, willEach answer result in (a) is equal to a N All answer results in the process are subjected to process similarity calculation to obtain a process similarity result matrix +.>The specific calculation results are as follows:
the SIM (-) is a flow similarity calculation function, and the invention uses Word2Vec model in combination with cosine similarity algorithm to calculate the flow similarity;is->Is the transpose of index value, j is in the range of [1, V]Between (I)>Is->The j-th answer result;
step 2.4.3, position k E [1, N ] with maximum similarity value in each row of the positioning matrix T]And based on this position at o N Searching for the corresponding source code o k Then get o for each row k Deposit collectionIn (a) and (b);
step 2.4.4And->Is->Fine tuning the code to generate answer results for large models for question sets +.>For result set, machine labeling dataset +.>Is a construction of (3).
In the above embodiment, the following operations are all completed under the Microsoft Edge browser with the requirement content. The specific operation is as follows: opening a Microsoft Edge browser and accessing hundred degrees, inputting hundred-degree translation in a search column, clicking a query, clicking multiple languages for timely online translation, inputting your own in an input column, and obtaining a translation result. The operations before clicking the query are completed in https:// www.baidu.com/and the operations of timely and online translation of multiple languages in clicking are completed in https:// www.baidu.com/swd =%E7%99%BE%E5%BA%A6%BF%BB%E8%AF%91, and the subsequent operations are completed in https:// fasyi.baidu.com/? aldtype=16047#auto/zh. "is example 1.
The trained document is utilized to analyze the large model, a self-customized demand template is generated, and the generated template is shown in figure 5; and generating a large model based on the self-customized demand template and the trained codes, and generating RPA flow codes, wherein the specific steps are as follows:
step 3.1, taking the whole content in the self-customized demand template as a problem, and inquiring a trained code generation large model;
the self-customized demand template comprises the following specific steps:
step 3.1.1, acquiring a required document uploaded by a user, and extracting document contents by using an OCR algorithm;
step 3.1.2, sending the extracted document content into a trained document analysis large model for operation flow analysis, and outputting a document analysis large model analysis result;
based on the above requirement content of example 1, the document parsing large model parsing specific internal and identical text trained by the invention is:
(1) Opening the Microsoft Edge browser, and operating the links to be: https:// www.baidb.com;
(2) Inputting hundred-degree translation in a search bar, and operating links are as follows: https:// www.baidb.com;
(3) Clicking the query, and operating the link as follows: https:// www.baidu.com-
(4) Multilingual timely online translation, and operation links are as follows:
https://www.baidu.com/swd=%E7%99%BE%E5%BA%A6%E7%BF%BB%E8%AF%91;
(5) Inputting hello in the input field, and operating the link to be: https:// fanyi.baidu.com/? aldtype=16047#auto/zh;
(6) Obtaining a translation result, and operating the link as follows: https:// fanyi.baidu.com/? aldtype=16047#auto/zh;
step 3.1.3, performing format verification on the result output in step 3.1.2, wherein the specific verification rule is as follows: traversing the current output result from top to bottom and from left to right, and judging whether each operation flow step has a specific webpage link corresponding to the specific webpage link; if the verification is passed, the RPA operation flow chart which is the current result is taken as a step 3.1.4, otherwise, the user is informed of the content of verification failure, the requirement on the modification of the required document is required, and the step 3.1.1 is returned;
step 3.1.4, extracting corresponding webpage source codes by using a webpage source code analyzer based on webpage links and specific operation step descriptions in an RPA operation flow chart;
the method comprises the following steps of carrying out corresponding webpage source code extraction by using a webpage source code analyzer based on webpage links and specific operation step descriptions in an RPA operation flow chart, wherein the specific steps are as follows:
step 3.1.4.1, sequentially analyzing the web page links corresponding to each operation flow in the RPA operation flow chart by using a web page source code analyzer, and returning to the analyzed DOM tree;
3.1.4.2, analyzing the DOM tree layer by layer based on the depth-first rule, and judging whether the web page source code contained in the hierarchical label is the source code required by the current operation flow by using a source code discrimination function, wherein the specific judgment function is defined as follows:
wherein f s (. Cndot.) is the source code judgment function, S l 、S t Respectively, hierarchical label text and current operation flow description text, T (S l ,S t )、V(S l ,S t ) Based on S respectively l And S is t Is a structural similarity judging function and a vector cosine similarity judging function, D (S t ) Tagging the current level to the top of the DOM treeA geometric distance judging function; w (w) k Indicating the simultaneous occurrence of S l And S is t In the method, the Word2Vec model is also used for completing Word vector conversion, alpha, beta, gamma and kappa are source code judgment function adjusting factors, and eta is a set threshold;
step 3.1.4.3, if the whole DOM tree does not have the source code block meeting the source code judging function of step 3.1.4.2, indicating that the large model analyzes errors on the required document, and returning to step 3.1.2; otherwise, selecting the source code blocks contained in the hierarchical label which meets the judging conditions and has the highest score as the source codes required by the current operation flow.
Step 3.1.5, integrating the source code information acquired in the step 3.1.4 with the RPA operation flow chart output in the step 3.1.3 according to the self-customized demand template format to generate a final self-customized demand template; the generated self-customized demand template format is: the method is based on the following information that the flow operation is [ content ], the flow is related to web page link as [ url ], each flow reference source code is [ code ], RPA source code generation is carried out, the generated source code meets the compiling requirement of an RPA designer, and if the reply cannot be generated, the source code cannot be generated. "; the content, url and code in the customized demand template format correspond to the specific flowchart operation steps in step 3.1.3, the corresponding web page links and the step 3.1.4 source code information respectively.
Step 3.2, performing format verification on the RPA code of the code generation large model answer, if the verification requirement is met, entering step 3.3, otherwise, returning to step 3.1; wherein, the specific format check rule is as follows:
step 3.2.1, judging whether the currently generated RPA code meets the grammar requirement of an RPA designer, such as language requirement, indentation requirement, annotation format, brackets, etc. whether the numbers are consistent;
step 3.2.2, judging whether the structure of the currently generated RPA code is complete, and judging whether the webpage links, the position information in the source codes and the tag attribute information which are related in the self-customized demand template are all embodied in the RPA code;
based on the content in the self-customized demand template of example 1, step 3.2.2 checks whether the specific content RPA source code contains three links, namely, whether the specific content RPA source code contains a < ui: openBrowser > tag, < ui: setValue > tag, < ui: click > tag, < ui: getValue > tag, and whether xmlns attribute contains three links;
and 3.3, sending the codes meeting the verification requirements in the step 3.2 into an RPA designer as final RPA flow codes.
In the above embodiment, the generated RPA flow code is compiled by using an RPA designer, so as to realize the automatic generation of the component group, and the specific implementation flow is as follows: monitoring the large code model at intervals of every minute, and compiling the code by an RPA designer when receiving the RPA demand code returned by the large code model, wherein the specific steps are as follows:
Step 4.1, automatically loading a project dependent package according to the content of a < Assemblely reference > tag in the generated code, so as to ensure that the subsequent component codes can be operated orderly;
code content generated by generating large model based on example 1 code
<AssemblyReference>Airpa.Studio.Plugin.Workflow</AssemblyReference>
<AssemblyReference>AiRpa.System.Activities</AssemblyReference>
<AssemblyReference>AiRpa.UiAutomation.Activities</AssemblyReference>
The RPA designer can autonomously load three dependency package files of Workflow, system. Activities and UiAutomation. Activities in the step 1.
Step 4.2, after the dependent package is loaded successfully, the RPA designer performs visual page layout design according to the WorkflowViewState tag attribute;
code content generated based on example 1 code generation large model < workflow ViewState. ViewState manager:100% TOWHITE >, the RPA designer will autonomously generate a 100% page fit, top-down, white background blank page at step 4.2.
Step 4.3, executing all contents contained in the < sequence > tag sequentially from left to right from top to bottom, and selecting components by a designer according to the internal tag elements; performing component configuration according to the internal tag attribute;
code content < ui: setValue Continue On Error = "{ x: null }" value= "{ x: null }" displayname= "english input'" sap2010: workflow view state.idref = "setvalue_1" >, the RPA designer automatically selects a set text component at step 3 and automatically configures a hint text english input, set error return information set to Null.
And 4.4, after the execution is finished, carrying out component group layout on the visual page according to the execution result, and simultaneously storing the layout result in a preset local position in a format of oxaml and informing a user to check in time.
The invention has the following beneficial effects:
the traditional large model training method generally adopts two methods of manual marking and unsupervised training, and the manual marking data set can ensure the training precision of the large model, but the data volume of the large model is at least 10 ten thousand, so that a great amount of manpower is required for simply using the manual marking. Although the unsupervised learning method avoids manpower loss, the data quality of the unsupervised mode cannot be guaranteed, and large model learning false information is easily caused, and the false information is unacceptable to the RPA field. Therefore, the invention provides a man-machine cooperation mode, which greatly reduces the manual data quantity, simultaneously ensures the accuracy of machine labeling data to the greatest extent and further ensures the accuracy of large model learning.
Compared with the method for extracting semantics by directly acting on the required document through the traditional NLP algorithm, the method provided by the invention has the advantages that the specific operation steps in the document are analyzed and integrated by utilizing the document analysis large model, and meanwhile, the content points in the required document are verified by utilizing the content verification rule, so that the generated RPA required operation flow chart is more accurate, the robustness is better, and the application range is wider.
Compared with the existing large model code generation technology, the self-customized demand-based template used by the invention reduces the difficulty of directly outputting the program language conforming to the grammar constraint by providing effective natural language algorithm description and corresponding webpage source code content, and simultaneously generates a large model and a standard verification rule by using the self-trained code, thereby effectively improving the reliability and stability of the output code.
In order to solve the problems that in the existing RPA assembly automatic generation process, assembly operation flow chart generation is inaccurate, the application range is narrow, high-matching RPA assembly execution codes are difficult to generate and the like, the invention provides an RPA flow automatic generation method based on a large model and a self-customized demand template. Firstly, unlike the traditional NLP algorithm which directly performs semantic extraction on a demand document, the method utilizes a document analysis large model to analyze and integrate specific operation steps in the document, and simultaneously uses a result verification rule to verify demand points in the demand document, so that a generated RPA demand operation flow chart is more accurate. Secondly, in order to improve the executable performance and the generation accuracy of the corresponding RPA demand codes, the invention combines the generated demand operation flow chart and the specific source codes of the related operated web pages to form a specific demand template, and requests the code generation large model based on the template to obtain the high-matching RPA demand codes. And finally, inputting the generated corresponding RPA demand codes into an RPA designer for compiling, thereby realizing the automatic generation of the component group.
By effectively utilizing the large model, the problems of inaccurate generation of the component operation flow chart, narrow application range, difficult generation of high-matching RPA component execution codes and the like in the existing RPA component automatic generation process are solved, and the artificial workload is effectively reduced.
In another embodiment, the present invention provides a computer readable storage medium storing a computer program for causing a computer to execute an RPA process automation generation method based on a large model and a customized demand template as described above.
In another embodiment, the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the RPA flow automatic generation method based on the large model and the self-customized demand template when executing the computer program.
In the embodiments disclosed herein, a computer storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer storage medium would include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (10)

1. An RPA flow automatic generation method based on a large model and a self-customized demand template is characterized by comprising the following steps: training a document analysis large model and a code generation large model based on a man-machine collaborative labeling mode; then, analyzing the large model by using the trained document, and generating a self-customized demand template; generating a large model based on the self-customized demand template and the trained codes, and generating RPA codes; finally, based on the generated RPA code, the generated code is compiled by using an RPA designer, so that the automatic generation of the component group is realized.
2. The method for automatically generating the RPA process based on the large model and the self-customized demand template according to claim 1, wherein the training of the document analysis large model comprises the following steps:
step 1.1, collecting a demand document as an original document, and simultaneously manually adding a corresponding webpage link to a part, which relates to specific flow operation, in the collected demand document to form a demand document format;
step 1.2, randomly selecting 40% of the required documents to perform manual content extraction and label marking, and generating a manual marking data set D N =(a N ,q N ) The method comprises the steps of carrying out a first treatment on the surface of the And dividing the training set D according to a 9:1 dividing mode n1 And test set D n2 Where N is the total number of data sets, q N For D N A) document content set of (a) N For D N N1, N2 are the number of training and testing sets, respectively, and n1+n2=n;
step 1.3, D n1 Sending the file into a large file analysis model for primary fine adjustment;
step 1.4, carrying out data enhancement on the rest 60% of required documents by using a document splicing and random masking mode, extracting the content of the required documents after data enhancement by using an OCR algorithm, and inputting the extracted text into a fine-tuned document analysis large model for analysis; generating a machine annotation data set by taking document content as a question and taking answers of a document analysis large model as a result Wherein V is the total number of data sets, ">Is->Document content set->Is->Answering a result set;
step 1.5, D n1 Andthe new data sets are formed by combination and are input into a document analysis large model for fine tuning again; using test set D n2 Testing the re-fine-tuned document analysis large model, and if the model answer accuracy reaches more than 80%, completing model training to obtain a final document analysis large model; otherwise, the current training model is used as the latest fine tuning model to return to the step 1.4.
3. The method for automatically generating RPA flow based on a large model and a self-customized demand template according to claim 2, wherein the training of the code generation large model comprises the following steps:
step 2.1 based on dataset D N Medium answer result set a N ={a 1 ,a 2 ,…,a i ,…,a N Manually extracting corresponding process source codes by using a webpage source code extractor to generate a corresponding source code set o N ={o 1 ,o 2 ,…,o i ,…,o N Generating a corresponding RPA code set c using an RPA designer N ={c 1 ,c 2 ,…,c i ,…,c N -a }; wherein i is an index value ranging from [1, N]Between, a i 、o i 、c i Respectively representing an ith answer result, a webpage source code and an RPA code;
step 2.2, use a N And o N Is integrated content of (1) as a question setc N Answer result set as big model +. >Creating large model artificial annotation data set B by constituting codes N =(q c ,a c ) N The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the training set B is divided according to a 9:1 division mode n1 And test set B n2
Step 2.3, B n1 Sending the code into a code generation large model for primary fine tuning;
step 2.4, based on a N 、o N Andcode generation large model machine annotation set +.>Is constructed according to the following steps;
step 2.5, B n1 Andthe new data sets are formed by combination and are input into a code generation large model for fine tuning again; using test set B n2 Testing the re-fine-tuned large model, and if the answer accuracy of the large model reaches more than 80%, completing model training to obtain a final code generation large model; otherwise, the current training model is used as the latest fine tuning model to return to the step 2.4.
4.The method for automatically generating RPA flow based on large model and customized demand template according to claim 3, wherein the method is based on a N 、o N Andcode generation large model machine annotation set +.>The construction of the (C) is as follows:
step 2.4.1, establishing an empty machine labeling source code set
Step 2.4.2, willEach answer result in (a) is equal to a N All answer results in the process are subjected to process similarity calculation to obtain a process similarity result matrix +. >The specific calculation results are as follows:
the SIM (-) is a flow similarity calculation function, and a Word2Vec model is used for carrying out flow similarity calculation by combining a cosine similarity algorithm;is->Is the transpose of index value, j is in the range of [1, V]Between (I)>Is->The j-th answer result;
step 2.4.3, position k E [1, N ] with maximum similarity value in each row of the positioning matrix T]And based on this position at o N Searching for the corresponding source code o k Then get o for each row k Deposit collectionIn (a) and (b);
step 2.4.4And->Is->Fine tuning the code to generate answer results for large models for question sets +.>For result set, machine labeling dataset +.>Is a construction of (3).
5. The automatic generation method of the RPA flow based on the large model and the self-customized demand template according to claim 1, wherein the self-customized demand template is generated by analyzing the large model by using the trained document; and generating a large model based on the self-customized demand template and the trained codes, and generating RPA codes, wherein the specific steps are as follows:
step 3.1, taking the whole content in the self-customized demand template as a problem, and inquiring a trained code generation large model;
Step 3.2, performing format verification on the RPA code of the code generation large model answer, if the verification requirement is met, entering step 3.3, otherwise, returning to step 3.1; wherein, the format check rule is as follows:
step 3.2.1, judging whether the currently generated RPA code meets the grammar requirement of an RPA designer;
step 3.2.2, judging whether the structure of the currently generated RPA code is complete;
and 3.3, sending the codes meeting the verification requirements in the step 3.2 into an RPA designer as final RPA flow codes.
6. The method for automatically generating the RPA flow based on the large model and the customized demand template according to claim 5, wherein the customized demand template comprises the following specific steps:
step 3.1.1, acquiring a required document uploaded by a user, and extracting document contents by using an OCR algorithm;
step 3.1.2, sending the extracted document content into a trained document analysis large model for operation flow analysis, and outputting a document analysis large model analysis result;
and 3.1.3, performing format verification on the result output in the step 3.1.2, wherein the verification rule is as follows: traversing the current output result from top to bottom and from left to right, and judging whether each operation flow step has a webpage link corresponding to the webpage link; if the verification is passed, the RPA operation flow chart which is the current result is taken as a step 3.1.4, otherwise, the user is informed of the content of verification failure, the requirement on the modification of the required document is required, and the step 3.1.1 is returned;
Step 3.1.4, extracting corresponding webpage source codes by using a webpage source code analyzer based on webpage links and operation step descriptions in the RPA operation flow chart;
step 3.1.5, integrating the source code information acquired in the step 3.1.4 with the RPA operation flow chart output in the step 3.1.3 according to the self-customized demand template format to generate a final self-customized demand template; the generated self-customized demand template format is: the flow operation is [ content ], the flow involves web page link is [ url ], each flow reference source code is [ code ], RPA source code generation is carried out, the generated source code meets the compiling requirement of an RPA designer, and if reply cannot be generated, the source code cannot be generated; the content, url and code in the customized demand template format correspond to the operation steps of the flow chart in the step 3.1.3, the corresponding web page links and the step 3.1.4 source code information respectively.
7. The method for automatically generating the RPA flow based on the large model and the self-customized demand template according to claim 6, wherein the specific steps of extracting the corresponding web source code by using a web source code analyzer based on the web link and specific operation step description in the RPA operation flow chart are as follows:
Step 3.1.4.1, sequentially analyzing the web page links corresponding to each operation flow in the RPA operation flow chart by using a web page source code analyzer, and returning to the analyzed DOM tree;
3.1.4.2, analyzing the DOM tree layer by layer based on the depth-first rule, and judging whether the web page source code contained in the hierarchical label is the source code required by the current operation flow by using a source code discrimination function, wherein the specific judgment function is defined as follows:
wherein f s (. Cndot.) is the source code judgment function, S l 、S t Respectively, hierarchical label text and current operation flow description text, T (S l ,S t )、V(S l ,S t ) Based on S respectively l And S is t Is a structural similarity judging function and a vector cosine similarity judging function, D (S t ) A geometric distance judging function for the current level label to the top end of the DOM tree; w (w) k Indicating the simultaneous occurrence of S l And S is t Middle phaseThe number of the same words, theta (·) represents text vectorization, word2Vec model is used for completing Word vector conversion, alpha, beta, gamma and kappa are source code judgment function adjusting factors, and eta is a set threshold;
step 3.1.4.3, if the whole DOM tree does not have the source code block meeting the source code judging function of step 3.1.4.2, indicating that the large model analyzes errors on the required document, and returning to step 3.1.2; otherwise, selecting the source code blocks contained in the hierarchical label which meets the judging conditions and has the highest score as the source codes required by the current operation flow.
8. The automatic generation method of the RPA flow based on the large model and the self-customized demand template according to claim 1, wherein the generated RPA code is compiled by an RPA designer, so that the automatic generation of the component group is realized, and the specific steps are as follows:
step 4.1, automatically loading a project dependent package according to the content of a < Assemblely reference > tag in the generated code, so as to ensure that the subsequent component codes can be operated orderly;
step 4.2, after the dependent package is loaded successfully, the RPA designer performs visual page layout design according to the WorkflowViewState tag attribute;
step 4.3, executing all contents contained in the < sequence > tag sequentially from left to right from top to bottom, and selecting components according to the internal tag elements; performing component configuration according to the internal tag attribute;
and 4.4, after the execution is finished, carrying out component group layout on the visual page according to the execution result, and simultaneously storing the layout result in a preset local position in a format of oxaml and informing a user to check in time.
9. A computer readable storage medium storing a computer program, wherein the computer program causes a computer to execute an RPA process automation generation method based on a large model and a self-customized demand template according to any one of claims 1 to 8.
10. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method for automated generation of RPA procedures based on a large model and a custom demand template as claimed in any one of claims 1 to 8 when the computer program is executed.
CN202311830148.3A 2023-12-28 2023-12-28 RPA flow automatic generation method based on large model and self-customized demand template Pending CN117648093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311830148.3A CN117648093A (en) 2023-12-28 2023-12-28 RPA flow automatic generation method based on large model and self-customized demand template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311830148.3A CN117648093A (en) 2023-12-28 2023-12-28 RPA flow automatic generation method based on large model and self-customized demand template

Publications (1)

Publication Number Publication Date
CN117648093A true CN117648093A (en) 2024-03-05

Family

ID=90045197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311830148.3A Pending CN117648093A (en) 2023-12-28 2023-12-28 RPA flow automatic generation method based on large model and self-customized demand template

Country Status (1)

Country Link
CN (1) CN117648093A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891447A (en) * 2024-03-14 2024-04-16 蒲惠智造科技股份有限公司 Enterprise management software development method, device, equipment and medium
CN117951038A (en) * 2024-03-27 2024-04-30 浙江大学 Rust language document test automatic generation method and device based on code large model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891447A (en) * 2024-03-14 2024-04-16 蒲惠智造科技股份有限公司 Enterprise management software development method, device, equipment and medium
CN117951038A (en) * 2024-03-27 2024-04-30 浙江大学 Rust language document test automatic generation method and device based on code large model

Similar Documents

Publication Publication Date Title
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN117648093A (en) RPA flow automatic generation method based on large model and self-customized demand template
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN113987199B (en) BIM intelligent image examination method, system and medium with standard automatic interpretation
CN113254507B (en) Intelligent construction and inventory method for data asset directory
CN113138920B (en) Software defect report allocation method and device based on knowledge graph and semantic role labeling
CN115964273A (en) Spacecraft test script automatic generation method based on deep learning
CN111651994B (en) Information extraction method and device, electronic equipment and storage medium
CN115878003A (en) RPA webpage operation automation method and system based on Transformer
EP4364044A1 (en) Automated troubleshooter
CN116719520A (en) Code generation method and device
CN112988982B (en) Autonomous learning method and system for computer comparison space
CN117520561A (en) Entity relation extraction method and system for knowledge graph construction in helicopter assembly field
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN117473054A (en) Knowledge graph-based general intelligent question-answering method and device
CN114722159B (en) Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
CN114647739B (en) Entity chain finger method, device, electronic equipment and storage medium
CN115344661A (en) Equipment halt diagnosis method and device, electronic equipment and storage medium
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
Tan et al. Checking Refactoring Detection Results Using Code Changes Encoding for Improved Accuracy
CN111782781A (en) Semantic analysis method and device, computer equipment and storage medium
Zanibbi A Language for Specifying and Comparing Table Recognition Strategies
CN117874261B (en) Question-answer type event extraction method based on course learning and related equipment
CN116450717B (en) Data integration method and information management system for cross-service modules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination