CN112163420A - NLP technology-based RPA process automatic generation method - Google Patents

NLP technology-based RPA process automatic generation method Download PDF

Info

Publication number
CN112163420A
CN112163420A CN202011010218.7A CN202011010218A CN112163420A CN 112163420 A CN112163420 A CN 112163420A CN 202011010218 A CN202011010218 A CN 202011010218A CN 112163420 A CN112163420 A CN 112163420A
Authority
CN
China
Prior art keywords
rpa
flow
activity
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011010218.7A
Other languages
Chinese (zh)
Inventor
于思洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianxing Youling Technology Co ltd
Original Assignee
Beijing Tianxing Youling Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianxing Youling Technology Co ltd filed Critical Beijing Tianxing Youling Technology Co ltd
Priority to CN202011010218.7A priority Critical patent/CN112163420A/en
Publication of CN112163420A publication Critical patent/CN112163420A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Molecular Biology (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of natural language processing, and particularly discloses an RPA flow automatic generation method based on NLP technology, which comprises the following steps: step S1, firstly, data collection is carried out and an expert system is established; step S2, constructing a natural language processing model and a rule base, including data enhancement and data expansion, a neural network model, neural network model selection, activity rule matching and expression rule matching; and step S3, generating an RPA flow code file. The invention combines NLP and RPA flow generation technology, realizes the direct conversion from flow design document to flow code file, enables the user to compile RPA automatic flow only by describing the business flow through natural language, reduces the consumption of manpower, material resources and financial resources of enterprises, and saves the development cost in the implementation of RPA project.

Description

NLP technology-based RPA process automatic generation method
Technical Field
The invention relates to the technical field of natural language processing, in particular to an RPA flow automatic generation method based on NLP technology.
Background
Nlp (natural Language processing), natural Language processing, is a field in which computer science, artificial intelligence, and linguistics focus on the interaction between computer and human (natural) Language. Rpa (robotic Process automation), i.e., robot Process automation, is a technology for compiling a Process by an interface operation on a computer so as to conveniently realize office automation. At present, various RPA products are provided by various manufacturers at home and abroad. These products, while functionally diverse, almost all include a process design platform. The RPA flow design platform is commonly referred to as an "RPA designer. Although most RPA designers have packaged (generally, packaged components are referred to as "activities") the automation operations (such as mouse click, keyboard entry, etc.) commonly used by business users to facilitate the users to compile processes in an interface operation manner, the daily work of business users usually includes data processing and some more complex processing logics, and the method of business processing is difficult to package one by one in a standard product, so that in practical situations, some professional-based implementers are still required to complete a complete compilation of business processes by embedding codes according to specific business requirements. This raises the use threshold of the RPA designer because it is difficult for a business user without the programming infrastructure to independently complete the programming of an automated process.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an RPA process automatic generation method based on NLP technology, which combines NLP and RPA process generation technologies to realize direct conversion from process design documents to process code files, so that users can compile RPA automation process only by describing business processes through natural language, thereby reducing the consumption of manpower, material resources and financial resources of enterprises and saving the development cost in the implementation of RPA projects.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an RPA flow automatic generation method based on NLP technology comprises the following steps:
step S1, firstly, data collection is carried out and an expert system is established;
step S2, constructing a natural language processing model and a rule base, including data enhancement and data expansion, a neural network model, neural network model selection, activity rule matching and expression rule matching;
and step S3, generating an RPA flow code file.
Preferably, in step S1, the method further includes collecting item data, analyzing the number and frequency of RPA activities used in the real RPA item, and screening all activities with coverage rate of 95%.
Preferably, in the step S1, the method further includes labeling the sentence corresponding to the activity in the flow design document, and constructing the textual description, the metadata describing the type of the corresponding activity, and the segment in the textual description corresponding to the type of the one or more input parameters of the activity.
Preferably, in the step S2, the data enhancement includes synonym replacement, activity parameter replacement, multiple active sentence generation, and nested active sentence generation.
Preferably, in the step S2, the neural network model is used to determine the activities of a sentence, and identify parameters of each activity; the neural network model selection comprises selecting a BERT neural network model; and the active rule matching and the expression rule matching both comprise summarizing common templates, writing out corresponding regular expressions, finally matching texts through the regular expressions, and outputting corresponding results in a JSON format.
Preferably, in step S3, the method further includes parsing the JSON file generated in the above step to obtain the category and attribute content information of the activity to be generated, and obtaining the category and attribute information of all the activities by using the reflection of C #, and ActivityBuilder and xamlmlwriter in Windows Workflow Foundation and generating corresponding Xaml files.
By adopting the technical scheme, the RPA flow automatic generation method based on the NLP technology provided by the invention has the following beneficial effects: by combining NLP and RPA process generation technologies, the results are converted into code files readable by an RPA designer by using the process generation technologies, and an operable automatic process is finally realized by borrowing the code analysis capability of the RPA designer, so that the direct conversion from a process design document to a process code file is realized, a user can compile an RPA automatic process only by describing a business process through a natural language, the consumption of manpower, material resources and financial resources of an enterprise is reduced, and the development cost in the implementation of an RPA project is also saved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the operation of the present invention;
in the figure, S1-step S1, S2-step S2, S3-step S3.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1-2, the method for automatically generating an RPA flow based on the NLP technology combines the NLP technology with the RPA flow generation technology, understands the user intention through the NLP technology, maintains an expert system at the same time, abstracts the knowledge of an implementer with abundant experience into rules, then fuses the result of analyzing the user intention by the NLP with the rules of the expert system to generate interpretable structural data for an automated flow, finally converts the result into a code file readable by an RPA designer by using the flow generation technology, borrows the code analysis capability of the RPA designer, and finally realizes an executable automated flow, and the main implementation principle is as follows:
1. first, data collection is performed and an expert system is established. Project data completed by ten qualified RPA implementation engineers in a team are collected, the project data comprise project flow documents, flow codes and the like, experience in the implementation process is also collected, the number and the frequency of RPA activities used in a real RPA project are summarized and analyzed according to the experience, and 20 activities with the coverage rate of 95% are screened out. All activities are not included because the data volume of some activities with low use frequency is insufficient, which not only increases the technical complexity, but also affects the construction of subsequent models, resulting in increased errors of the output results. And meanwhile, marking sentences corresponding to the activities in the flow design document, and constructing original text description (namely business instructions described by natural language in the document), and metadata describing the corresponding activity type and fragments in the original text description corresponding to the type of one or more input parameters of the activity.
2. Secondly, a natural language processing model and a rule base are constructed, and the method comprises the following steps:
2.1 data enhancement and data augmentation
Data enhancement is mainly used for expanding a data set and solving some problems existing in the current data set. The current data set suffers from two main problems:
problem 1: deep learning relies on a large amount of data, the amount of existing real service scene data is relatively small (the total amount is 3695 pieces of text data), the distribution is uneven, and a large part of common activities are occupied.
Problem 2: the real data lack the text of a plurality of activities and nested activities, such as 'Click a button, open D: \ RPA \ test. xlsxsxsX' which belongs to a plurality of activities (Click and createExcel) 'If the amount is larger than 0, Click a payment button' which belongs to the nested activities (Click activity is nested in If activity). In order to solve the above problems, 4 methods of data enhancement are adopted:
replacement of a synonym: dividing the command sentence into words, randomly replacing some non-keyword words with similar words, and replacing with synnyms.
And (3) activity parameter replacement: the marked real command sentence comprises an activity type, a parameter type and a specific parameter, and the parameter of the corresponding parameter type can be replaced. For example, "www.baidu.com" in "navigate to www.baidu.com" is a Url parameter, which is randomly replaced with another Url using campaign parameter replacement.
Multiple active sentence generation: two sentences belonging to different activities are found at random and then spliced together. For example, two command sentences, namely "click button" and "input RPA in text box" are spliced together to obtain a multi-active sentence "click button and input RPA in text box".
Nested active sentence generation: the If and ForEach activities can contain nested activities, and a simple nested activity sentence can be generated by randomly replacing the Then and Else parameters of the If with another activity sentence.
2.2 neural network model
The neural network model is used for judging the activity of a sentence and identifying the parameter of each activity, and can be suitable for multi-activity and nested activity identification. For example, given an active sentence "enter password in password column and Click login button", it is necessary to identify two activities (typeInto and Click) to which the sentence belongs, and then identify the parameters corresponding to the two activities respectively (typeInto selector is password column, text is password; Click selector is login button).
2.2.1 neural network model selection
The document analysis task comprises two subtasks of text classification and named entity identification, and the selectable mainstream neural network models comprise BilSTM, CNN, BERT and the like, and the three models are realized through codes. The results in the validation phase found that BERT was far more effective than BilSTM and CNN, and was about 30% higher than BilSTM and CNN on the F1 score. Therefore, the BERT model is mainly adopted, and the advantages and the disadvantages of the BERT model comprise:
BERT has more parameter models which are pre-trained in a large-scale Chinese pre-material library, and the pre-training model is better to be used for retraining under the condition of less data volume.
BERT can be used for longer text sequences than BiLSTM and CNN.
A multi-layer Attention fusion method is adopted in BERT, and each word can be better combined with information of other words.
BERT, however, has some drawbacks in that its model is relatively large (with 1 hundred million parameters) and is trained for a long time, and thus takes a long time to verify the effectiveness of some optimizations.
2.2.2 Activity rule matching and expression rule matching
In addition to the neural network model mentioned above, the activity and parameter identification also uses a rule matching method for processing complex activity descriptions and automatically generating code expressions (for example, "yesterday" will be transformed into "new. adddays (-1)") so as to further generate available flow codes, which also partially solves the problem that business users have no programming basis and are often troubled by some code expressions when programming flows.
The activity is similar to the rule matching method of the expression, and the activities and the rule matching method of the expression are that some common templates are summarized firstly, then the corresponding regular expression is written out, and finally the text is matched through the regular expression. For example, there is a regular expression 'at (input)' of typeInto, the command sentence "input text in text box" can be matched with the regular expression, and the corresponding parameters "text box" and "text" can be identified.
And outputting the result in a JSON format to form an input file for generating the flow code file.
3. And finally generating an RPA flow code file.
Analyzing the JSON file generated in the steps to obtain information such as the category, the attribute content and the like of the activity to be generated, acquiring the category and the attribute information of all the activities by using the reflection of C #, and ActivityBuilder and XamlXmlWriter in Windows Workflow Foundation, and generating a corresponding Xaml file. This file is a flow file readable by the RPA designer, and the flow that can be run can be seen by opening the file using the RPA designer. From the perspective of an end user, only the flow design document needs to be input to obtain the executable RPA flow file.
It can be understood that the invention has reasonable design and unique structure, realizes the direct conversion from the flow design document to the flow code file by combining the NLP and the RPA flow generation technology, enables the user to compile the RPA automatic flow only by describing the business flow through the natural language, reduces the consumption of the manpower, material resources and financial resources of the enterprise, and also saves the development cost in the implementation of the RPA project.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims (6)

1. An RPA process automatic generation method based on NLP technology is characterized in that: the method comprises the following steps:
step S1, firstly, data collection is carried out and an expert system is established;
step S2, constructing a natural language processing model and a rule base, including data enhancement and data expansion, a neural network model, neural network model selection, activity rule matching and expression rule matching;
and step S3, generating an RPA flow code file.
2. The RPA flow automatic generation method based on NLP technology according to claim 1, characterized in that: in step S1, the method further includes collecting item data, analyzing the number and frequency of RPA activities used in the real RPA item, and screening all activities with coverage rate of 95%.
3. The RPA flow automatic generation method based on NLP technology according to claim 1, characterized in that: in step S1, the method further includes labeling the sentence corresponding to the activity in the flow design document, and constructing the textual description, the metadata describing the type of the corresponding activity, and the segment in the textual description corresponding to the type of the one or more input parameters of the activity.
4. The RPA flow automatic generation method based on NLP technology according to claim 1, characterized in that: in the step S2, the data enhancement includes synonym replacement, activity parameter replacement, multiple active sentence generation, and nested active sentence generation.
5. The RPA flow automatic generation method based on NLP technology according to claim 1, characterized in that: in step S2, the neural network model is used to determine the activities of a sentence, and identify the parameters of each activity; the neural network model selection comprises selecting a BERT neural network model; and the active rule matching and the expression rule matching both comprise summarizing common templates, writing out corresponding regular expressions, finally matching texts through the regular expressions, and outputting corresponding results in a JSON format.
6. The RPA process automatic generation method based on NLP technology according to claim 5, wherein: in step S3, the method further includes parsing the JSON file generated in the above steps to obtain the category and attribute content information of the activity to be generated, and obtaining the category and attribute information of all the activities by using the reflection of C #, and actitybuilder and xamlmwriter in Windows Workflow Foundation and generating corresponding Xaml files.
CN202011010218.7A 2020-09-23 2020-09-23 NLP technology-based RPA process automatic generation method Pending CN112163420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011010218.7A CN112163420A (en) 2020-09-23 2020-09-23 NLP technology-based RPA process automatic generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011010218.7A CN112163420A (en) 2020-09-23 2020-09-23 NLP technology-based RPA process automatic generation method

Publications (1)

Publication Number Publication Date
CN112163420A true CN112163420A (en) 2021-01-01

Family

ID=73863450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011010218.7A Pending CN112163420A (en) 2020-09-23 2020-09-23 NLP technology-based RPA process automatic generation method

Country Status (1)

Country Link
CN (1) CN112163420A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159951A (en) * 2021-04-23 2021-07-23 平安证券股份有限公司 Financial data clearing method, device, equipment and storage medium
CN113360649A (en) * 2021-06-04 2021-09-07 湖南大学 Flow error control method and system based on natural language processing in RPA system
CN113434798A (en) * 2021-06-21 2021-09-24 湖南大学 Method and system for generating codeless RPA (resilient packet Access) automatic flow file
CN113535551A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Text model construction method for 6016B rule test based on json format specification description
CN114926151A (en) * 2022-06-21 2022-08-19 中关村科学城城市大脑股份有限公司 RPA flow automatic generation method and device based on reinforcement learning
CN115098205A (en) * 2022-06-17 2022-09-23 来也科技(北京)有限公司 Control method for realizing IA flow editing interface based on RPA and AI
CN116719514A (en) * 2023-08-08 2023-09-08 安徽思高智能科技有限公司 Automatic RPA code generation method and device based on BERT
WO2023226129A1 (en) * 2022-05-24 2023-11-30 来也科技(北京)有限公司 Item-rule code generation method and apparatus combining rpa and ai, and electronic device
CN117311798A (en) * 2023-11-28 2023-12-29 杭州实在智能科技有限公司 RPA flow generation system and method based on large language model
CN117421414A (en) * 2023-12-18 2024-01-19 珠海金智维信息科技有限公司 Design method of RPA intelligent interactive system based on AIGC

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159951B (en) * 2021-04-23 2022-10-14 平安证券股份有限公司 Financial data clearing method, device, equipment and storage medium
CN113159951A (en) * 2021-04-23 2021-07-23 平安证券股份有限公司 Financial data clearing method, device, equipment and storage medium
CN113360649A (en) * 2021-06-04 2021-09-07 湖南大学 Flow error control method and system based on natural language processing in RPA system
CN113360649B (en) * 2021-06-04 2024-01-05 湖南大学 Natural language processing-based flow error control method and system in RPA system
CN113434798A (en) * 2021-06-21 2021-09-24 湖南大学 Method and system for generating codeless RPA (resilient packet Access) automatic flow file
CN113434798B (en) * 2021-06-21 2023-05-23 湖南大学 Code-free RPA automatic process file generation method and system
CN113535551A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Text model construction method for 6016B rule test based on json format specification description
WO2023226129A1 (en) * 2022-05-24 2023-11-30 来也科技(北京)有限公司 Item-rule code generation method and apparatus combining rpa and ai, and electronic device
CN115098205A (en) * 2022-06-17 2022-09-23 来也科技(北京)有限公司 Control method for realizing IA flow editing interface based on RPA and AI
CN114926151A (en) * 2022-06-21 2022-08-19 中关村科学城城市大脑股份有限公司 RPA flow automatic generation method and device based on reinforcement learning
CN116719514A (en) * 2023-08-08 2023-09-08 安徽思高智能科技有限公司 Automatic RPA code generation method and device based on BERT
CN116719514B (en) * 2023-08-08 2023-10-20 安徽思高智能科技有限公司 Automatic RPA code generation method and device based on BERT
CN117311798A (en) * 2023-11-28 2023-12-29 杭州实在智能科技有限公司 RPA flow generation system and method based on large language model
CN117421414A (en) * 2023-12-18 2024-01-19 珠海金智维信息科技有限公司 Design method of RPA intelligent interactive system based on AIGC
CN117421414B (en) * 2023-12-18 2024-03-26 珠海金智维信息科技有限公司 Design method of RPA intelligent interactive system based on AIGC

Similar Documents

Publication Publication Date Title
CN112163420A (en) NLP technology-based RPA process automatic generation method
Fu et al. A survey on complex question answering over knowledge base: Recent advances and challenges
JP7486250B2 (en) Domain-specific language interpreter and interactive visual interface for rapid screening
Wang et al. Asking the right questions to elicit product requirements
Gannod et al. A framework for classifying and comparing software reverse engineering and design recovery techniques
Meziane et al. Artificial intelligence applications for improved software engineering development: New prospects: New Prospects
CN111221881B (en) User characteristic data synthesis method and device and electronic equipment
Jackson et al. From natural language to simulations: Applying gpt-3 codex to automate simulation modeling of logistics systems
Rokis et al. Exploring Low-Code Development: A Comprehensive Literature Review
Pittaras et al. A taxonomic system for failure cause analysis of open source AI incidents
Chiarello et al. Generative large language models in engineering design: opportunities and challenges
CN115469860B (en) Method and system for automatically generating demand-to-software field model based on instruction set
Chioaşcă Using machine learning to enhance automated requirements model transformation
Jubair et al. A multi‐agent K‐means with case‐based reasoning for an automated quality assessment of software requirement specification
De Kinderen Using Grounded Theory for Domain Specific Modelling Language Design: Lessons Learned from the Smart Grid Domain
Li et al. How to manage a task-oriented virtual assistant software project: an experience report
Kovalenko et al. Towards evaluation and comparison of tools for ontology population from spreadsheet data
Fill Semantic evaluation of business processes using SeMFIS
Sonje et al. draw2code: Ai based auto web page generation from hand-drawn page mock-up
Marques-Lucena et al. A semantic wiki approach to enable behaviour driven requirements management
Zhang et al. Predicting Relations in SG-CIM Model Based on Graph Structure and Semantic Information
Buttinger et al. JobOlize Headhunting by Information Extraction in the Era of Web 2.0
Alam AI-HUB 2.0 PROJECT REPORT: Application Of Large Language
Kim Designing of domain modeling for mobile applications development
Gupta Test Case Selection from Test Specifications using Natural Language Processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210101

WD01 Invention patent application deemed withdrawn after publication