CN114219438A - Document file distribution method, device, equipment and medium based on RPA and AI - Google Patents
Document file distribution method, device, equipment and medium based on RPA and AI Download PDFInfo
- Publication number
- CN114219438A CN114219438A CN202111532926.1A CN202111532926A CN114219438A CN 114219438 A CN114219438 A CN 114219438A CN 202111532926 A CN202111532926 A CN 202111532926A CN 114219438 A CN114219438 A CN 114219438A
- Authority
- CN
- China
- Prior art keywords
- official document
- information
- document
- historical
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000009826 distribution Methods 0.000 title claims description 56
- 238000012549 training Methods 0.000 claims description 48
- 238000013473 artificial intelligence Methods 0.000 claims description 31
- 238000012986 modification Methods 0.000 claims description 21
- 230000004048 modification Effects 0.000 claims description 21
- 238000012163 sequencing technique Methods 0.000 claims description 21
- 238000002372 labelling Methods 0.000 claims description 20
- 238000003058 natural language processing Methods 0.000 claims description 18
- 238000012015 optical character recognition Methods 0.000 claims description 18
- 238000005065 mining Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012937 correction Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 6
- 238000004801 process automation Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 14
- 230000008569 process Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000012550 audit Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000007689 inspection Methods 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/109—Time management, e.g. calendars, reminders, meetings or time accounting
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1661—Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Mechanical Engineering (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Robotics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method, a device, equipment and a medium for distributing official document based on RPA and AI. Wherein, the method comprises the following steps: s1, identifying the content of the official document file to obtain key information, wherein the key information comprises information of the field to which the official document file belongs; s2, determining target department information for receiving the official document file according to the key information; and S3, distributing the official document according to the corresponding target department information. By adopting the technical scheme, the problem of low efficiency and accuracy in manual document file processing is solved.
Description
Technical Field
The present application relates to the field of process automation technologies, and in particular, to a method, an apparatus, a device, and a medium for distributing documents based on RPA and AI.
Background
Robot Process Automation (RPA) is a Process task that simulates human operations on a computer through specific robot software and automatically executes according to rules.
Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence.
RPA has unique advantages: low code, non-intrusive. The low code means that the RPA can be operated without high IT level, and business personnel who do not know programming can also develop the flow; non-invasively, the RPA can simulate human operation without opening the interface with a software system. However, conventional RPA has certain limitations: can only be based on fixed rules and application scenarios are limited. With the continuous development of the AI technology, the limitation of the traditional RPA is overcome by the deep fusion of the RPA and the AI, and the RPA + AI is a Hand work + Head work, which greatly changes the value of the labor force.
At present, the distribution business of the official document files usually needs manual official document distribution to complete. The official document distributor needs to quickly extract the keywords from the official document contents and match the corresponding responsibility departments according to the keywords, and the processing process is complicated. Especially, under the conditions of more documents and longer document contents, the manual processing efficiency and accuracy are low.
Disclosure of Invention
The embodiment of the application provides a document file distribution method, a document file distribution device, document file distribution equipment and document file distribution media based on RPA and AI, and aims to solve the problems of low efficiency and low accuracy of manual document file processing, and the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for distributing an official document file based on an RPA and an AI, including:
s1, identifying the content of the official document file to obtain key information, wherein the key information comprises information of the field to which the official document file belongs;
s2, determining target department information for receiving the official document file according to the key information;
and S3, distributing the official document according to the corresponding target department information.
Optionally, step S1 specifically includes:
s11, calling an Optical Character Recognition (OCR) component to recognize the official document to obtain the official document content;
and S12, extracting the domain keywords of the domain to which the official document belongs from the official document content as key information according to a preset official document domain keyword table.
Optionally, step S12 specifically includes:
s121, determining paragraph titles in the official document content;
s122, extracting key sentences from the content of each paragraph belonging to the same paragraph title, wherein verbs in the key sentences meet the requirement of a preset official document corpus;
and S123, extracting the domain key words in the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
Optionally, the preset official document field keyword list is created in the following manner:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity larger than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary.
Optionally, step S2 specifically includes:
s21, taking the key information as the input of the trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model;
the trained sequencing model establishes the incidence relation between the key information of the official document content and the corresponding department information of the affiliated field.
Optionally, the ranking model is obtained by training in the following way:
extracting topic keywords from the content of the historical official document, and determining historical department information corresponding to the historical official document;
splicing the topic keywords and the historical department information, and using the spliced key information as part of training samples;
determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document;
generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information, wherein the positive sample represents that the historical official document and the historical target department information are in a correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in an incorrect corresponding relation;
and training the initial sequencing model based on part of the training samples, the positive samples and the negative samples to obtain a trained sequencing model.
Optionally, the manually corrected comment information for distributing the historical official document further includes:
the system comprises a modification example sentence library generated based on manual modification suggestions and a phrase error correction white list generated based on manual modification logs of official document contents.
In a second aspect, an embodiment of the present application provides an apparatus for distributing documents based on RPA and AI, including:
the key information determining module is configured to identify the content of the official document file to obtain key information, and the key information comprises information of the field to which the official document file belongs;
the target department information determining module is configured to determine target department information for receiving the official document according to the key information;
and the document file distribution module is configured to distribute document files according to the corresponding target department information.
Optionally, the key information determining module includes:
the document content recognition unit is configured to call an Optical Character Recognition (OCR) component to recognize the document file to obtain document content;
and the key information determining unit is configured to extract the domain key words of the domain to which the official document belongs from the official document contents as key information according to a preset official document domain key word list.
Optionally, the key information determining unit is specifically configured to:
determining paragraph titles in the official document content;
extracting key sentences from the content of each paragraph belonging to the same paragraph title, wherein verbs in the key sentences meet the requirement of a preset official document corpus;
and extracting the domain key words of the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
Optionally, the preset official document field keyword list is created in the following manner:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity larger than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary.
Optionally, the target department information determining module is specifically configured to:
taking the key information as the input of the trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model;
the trained sequencing model establishes the incidence relation between the key information of the official document content and the corresponding department information of the affiliated field.
Optionally, the ranking model is obtained by training in the following way:
extracting topic keywords from the content of the historical official document, and determining historical department information corresponding to the historical official document;
splicing the topic keywords and the historical department information, and using the spliced key information as part of training samples;
determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document;
generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information, wherein the positive sample represents that the historical official document and the historical target department information are in a correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in an incorrect corresponding relation;
and training the initial sequencing model based on part of the training samples, the positive samples and the negative samples to obtain a trained sequencing model.
Optionally, the manually corrected comment information for distributing the historical official document further includes:
the system comprises a modification example sentence library generated based on manual modification suggestions and a phrase error correction white list generated based on manual modification logs of official document contents.
In a third aspect, an embodiment of the present application provides an apparatus for distributing a document file, where the apparatus includes: a memory and a processor. Wherein the memory and the processor are in communication with each other via an internal connection path, the memory is configured to store instructions, the processor is configured to execute the instructions stored by the memory, and the processor is configured to perform the method of any of the above aspects when the processor executes the instructions stored by the memory.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the method in any one of the above-mentioned aspects is executed.
According to the technical scheme provided by the embodiment of the application, the RPA robot identifies the key information in the official document content, and can determine the target department information corresponding to the official document to be distributed according to the identified key information, so that the official document can be sent to the corresponding target department. By adopting the RPA robot to replace manual operation, the time of workers is saved, and the distribution efficiency of the document file is effectively improved.
The advantages or beneficial effects in the above technical solution at least include:
1. the content of the document file is identified by adopting the RPA robot instead of manual work, and the target department to which the document file is to be delivered is determined, so that the time of workers is saved, and the distribution efficiency of the document file is effectively improved.
2. By combining the RPA platform and the AI platform, the problem that the related technology wastes time and labor in the document content identification process is solved, and the efficiency and the accuracy of document content identification are improved.
3. In the process of distributing the official document, the RPA robot can quickly and accurately obtain the target department information of the official document to be distributed based on the trained sequencing model, and compared with the mode of manually determining the target department of the official document in the related technology, the method and the device for distributing the official document have the advantages that the time of workers is saved, and the distribution efficiency of the official document is effectively improved.
4. By creating the preset domain keyword vocabulary, more accurate key information of the official document content can be obtained based on the vocabulary, so that the sequencing model can predict more accurately based on the key information to obtain more accurate target department information related to the official document content.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 is a flowchart of a document file distribution method based on RPA and AI according to an embodiment of the present application;
FIG. 2a is a flowchart of a training method of a ranking model according to a second embodiment of the present disclosure;
FIG. 2b is a schematic diagram of a training method of a ranking model according to the second embodiment of the present application;
FIG. 2c is a diagram illustrating an effect of a display interface of an official document suggestion distribution component according to a second embodiment of the present application;
FIG. 2d is a schematic diagram of generating a phrase error correction white list according to the second embodiment of the present application;
fig. 3 is a flowchart of an RPA and AI-based official document distribution method according to a third embodiment of the present application;
fig. 4 is a block diagram illustrating a structure of an RPA and AI-based official document distribution device according to a fourth embodiment of the present disclosure;
fig. 5 is a block diagram of a device for distributing official document files according to a fifth embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
In the description of the present application, the term "key information" refers to important information in the official document contents that can reflect the subject of the official document contents, the semantics of the official document contents, or the field to which the official document belongs.
In the description of the present application, the term "target division information" refers to name information of a division to which a document is to be delivered, such as a finance division, a science and technology division, a human resources division, or a nursing home, and also includes information of persons who receive documents in the division.
In the description of the present application, the term "OCR" refers to Optical Character Recognition (Optical Character Recognition), and specifically refers to a process in which an electronic device examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.
In the description of the present application, a "predetermined official document corpus" refers to a large-scale electronic text library that is scientifically sampled and processed, for example, by automatic word segmentation, word labeling, etc., and in which language material that actually appears in official document language in actual use is stored.
In the description of the present application, the term "preset document domain keyword table" refers to a keyword in the domain to which documents in the "preset document corpus" belong, which is mined in a supervised manner, and the obtained keywords with similarity to the labeled sample being greater than a set threshold value, and a set of all the obtained keywords can be used as the domain keyword table. Where "supervised" learning refers to training of a model using a previously known attribute or goal. In the embodiment of the application, the captured keywords in the official document webpage are labeled, and the labeled keywords are used as labeled samples in the data mining process.
In the description of the present application, the term "NLP" refers to Natural Language Processing (Natural Language Processing), and in particular to a subject that studies linguistic problems of human interaction with computers. In the embodiment of the application, the method and the device are applied to the creation process of the keyword table in the preset official document field.
In the description of the present application, the term "automatic phrase mining method" (AutoPhrase) refers to the mining of words and phrases, which is a general term, the phrase mining inputs domain corpora and outputs domain phrases. The domain corpus is formed by fusing a large number of articles together. In the embodiment of the application, the phrases in the official document field are mined.
In the description of the present application, the "ranking model" is trained in a supervised learning manner, and the trained ranking model establishes an association relationship between key information of the document content and department information corresponding to the field to which the document content belongs.
In the description of the present application, the "example sentence library of amendment" includes a large number of example sentences obtained after manual amendment.
In the description of the present application, "distribution opinion information" refers to prompt or advice information given manually for a distribution operation of a document file, including a name of a department, a person who is a member of the department, etc., to which the document file is distributed.
In the description of the present application, the phrase error correction white list includes a large amount of target department information corresponding to the domain to which the official document contents belong after being manually corrected.
These and other aspects of embodiments of the present application will be apparent from and elucidated with reference to the following description and drawings. In the description and drawings, particular embodiments of the application are disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the application may be practiced, but it is understood that the embodiments of the application are not limited correspondingly in scope. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
The following describes in detail a document distribution method, apparatus, device and medium based on RPA and AI according to an embodiment of the present application with reference to the accompanying drawings.
Example one
Fig. 1 is a flowchart of an RPA and AI-based document file distribution method according to an embodiment of the present disclosure, where the method is applicable to document file distribution and other application scenarios. The technical scheme of the embodiment is implemented through an RPA robot, the RPA robot can be carried on a UiBot Creator platform, and the UiBot Creator platform is a powerful robot production tool in the profession and provides a good carrier for the robot. In this embodiment, the RPA robot may be set to start regularly every day, log in the document processing system, and acquire document files to be distributed, thereby avoiding the situation of document file backlog, and achieving the effect of improving the document processing efficiency. As shown in fig. 1, the method provided by this embodiment includes:
and S110, identifying the content of the official document file to obtain key information.
Among them, for a Document file submitted into a Document processing system, the Document file generally exists in a form of PDF (Portable Document Format) or a photocopy file. In this embodiment, when the RPA robot identifies the document file, the document file may be scanned by combining with an OCR component in an Artificial Intelligence (AI) technology to obtain the content of the document file. Wherein, the content of the official document file comprises the signature content in the official document file.
In this embodiment, the AI platform with signature and picture recognition functions is a UiBot Mage platform, which is a tool-type product that mainly provides AI capability support for RPA robot developers. The platform and a UiBot Creator platform carried by the RPA robot both depend on the UiBot platform, and the UiBot platform is a process automation expert and a platform facing to various requirements and providing intelligent robot service for the whole business process. The AI platform is integrated with a pre-configured signature identification template. By using the signature identification template, signature character identification operation, seal color identification operation, signature shape identification operation, signature position identification operation and the like can be executed. In this embodiment, signature recognition is mainly used to obtain the organization name of the document in the signature.
Optionally, the platform carried by the RPA robot may be combined with the AI platform by simultaneously logging in a target account, i.e., a UiBot account, of the RPA platform and the AI platform. After the target account number is used for simultaneously logging in a platform carried by the RPA robot and an AI platform, the platform carried by the RPA robot establishes communication connection with the AI platform, namely the RPA robot can directly call an OCR recognition function issued by the AI platform to recognize the document file. Set up like this, adopt OCR function to discern the official document file earlier for among the correlation technique in the AI platform, then adopt artifical mode to export the data after discerning, the mode of rethread manual leading-in to the RPA platform, this embodiment has solved correlation technique and has wasted time and energy the problem at official document content identification in-process through combining together RPA platform and AI platform, has improved official document content identification's efficiency.
In this embodiment, after obtaining the content of the document file, the RPA robot may extract the key information from the content. The key information refers to the subject of the official document content, the semantics of the official document content or the important information of the field to which the official document belongs.
For example, the RPA robot may match the content of the document with a preset document corpus, and extract keywords from the document content, the similarity of which to the corpus in the preset document corpus is greater than a set threshold. The preset official document corpus refers to a large-scale electronic text library which is scientifically sampled and processed, such as automatic word segmentation, word labeling and the like, and language materials which are actually appeared in the official document language in actual use are stored in the preset official document corpus.
For example, the RPA robot may further match the content of the document file with a preset document domain keyword table, and extract a domain keyword in the domain to which the document belongs from the document content as the key information. The preset official document field keyword list is a keyword list which is obtained by mining keywords in the field to which official documents belong in a preset official document corpus in a supervision mode, wherein the similarity between the obtained keywords and a labeled sample is greater than a set threshold value, and all the obtained keyword sets can be used as the field keyword list. Where "supervised" learning refers to using a previously known attribute or goal to guide the learning process. In the embodiment of the application, the captured keywords in the official document webpage are labeled, and the labeled keywords are used as labeled samples in the data mining process.
The purpose of the embodiment of the application is to distribute the official document to the corresponding responsible department. Because the corresponding fields of different departments are different, the embodiment can extract the field keywords from the document contents to be distributed as the key information according to the document field keyword list by creating the document field keyword list. Based on the correspondence between the key information and the department field, the RPA robot can determine a target department for receiving the document file, and can distribute the document file to the corresponding target department.
And S120, determining target department information for receiving the official document file according to the key information.
The target department information refers to name information of a department to which the official document is to be sent, such as a financial department, a scientific and technological department, a human resource department or a monitoring center, and the target department information further includes information of persons in the department who receive the official document.
For example, the RPA robot can understand the domain of the key information of the document content through a semantic recognition method in the NLP service, for example, if the key information is "personnel compilation", the corresponding domain is "human resources domain"; if the critical information is "responsibility exploration", "supervision accountability", the corresponding domain may include "supervision domain", "audit domain", or "legal domain". Because the corresponding organization mechanisms in different fields are different, the RPA robot can determine the corresponding target department through the field to which the document file belongs, for example, if the field to which the document file belongs is the 'human resource field', the corresponding department is 'human resource department', that is, if the obtained key information is 'personnel compilation', the corresponding target department information includes 'human resource department'; if the field to which the official document belongs is the 'supervision field', the corresponding department is the 'inspection department for period'; if the field to which the official document belongs is 'audit field', the corresponding department is 'audit department'; if the domain to which the official document belongs is the legal domain, the corresponding department is the legal department, namely if the obtained key information is the responsibility exploration and the supervision accountability, the corresponding target part information comprises the inspection and supervision department and the auditing and legal department.
Optionally, in order to obtain the target department information corresponding to the document file more quickly and accurately, the target department information of the document file is determined by using a ranking model in the NLP service in the embodiment of the present application. Specifically, the initial ranking model can be trained in a supervised manner, so that the trained ranking model can establish an association relationship between key information of official document content and department information corresponding to the field of the official document. When the model is used for determining the target department information of the official document, the key information of the official document content can be used as the input of the trained sequencing model, and the official document department information with the largest weight value is selected from the output of the model to be used as the target department information corresponding to the official document.
And S130, distributing the official document files according to the corresponding target department information.
In this embodiment, the distribution operation of the document file is performed by the RPA robot. The RPA robot can distribute different official document files to corresponding target departments based on the corresponding relation between the key information of the official document contents and the department field information of the official document, so that manual operation on the official document files is replaced, and the time of workers is saved.
For example, for the same document, if it is determined that there is only one target department corresponding to the document, the RPA robot sends the document to the corresponding target department; and if the fact that a plurality of target departments correspond to the official document is determined, the RPA robot sends the official document to the corresponding target departments at the same time.
According to the technical scheme provided by the embodiment, the RPA robot identifies the key information in the document content, and can determine the target department corresponding to the document to be distributed according to the identified key information, so that the document can be sent to the corresponding target department. By adopting the RPA robot to replace manual operation, the time of workers can be saved, and particularly under the condition that the number of document files to be distributed is large, the effect of improving the distribution efficiency of the document files is achieved.
In order to more accurately obtain the target department information corresponding to the official document, the embodiment of the application adopts a sequencing model in NLP service to determine the target department information of the official document. Next, the process of determining the target department information corresponding to the official document is described in detail in the training stage and the application stage of the ranking model.
Example two
Fig. 2a is a flowchart of a training method of a ranking model according to a second embodiment of the present application, where the method is performed by a training apparatus of a ranking model, and the apparatus may be implemented by software and/or hardware, as shown in fig. 2a, and the method includes:
s210, extracting the topic keywords from the content of the historical official document, and determining the historical department information corresponding to the historical official document.
The historical document refers to a document file which is distributed and completed before the current time. The official document file usually adopts a common official document format, for example, the body part has corresponding titles, including a primary title, a secondary title, etc., and different titles have corresponding title identifications.
Illustratively, the title information can be extracted from the historical official document content as a subject keyword according to the title identification in the historical official document content, or a text sorting algorithm (TextRank) can be used to extract the keyword from the historical official document content.
Illustratively, in order to improve the efficiency of model training and save the calculation amount, the historical official document may be preprocessed by the pre-training model to obtain the keyword information therein, and then the subject keyword of the official document content may be extracted from the obtained keyword information. The pre-training model can be obtained based on large-scale unmarked official document training, or can be obtained by performing data enhancement processing on small-scale marked official document based on a data enhancement technology. The data enhancement technology comprises the step of replacing key information of the official document file or splicing all the key information.
Specifically, the pre-training model may be BERT (Bidirectional Encoder model), XLNET (general autoregressive pre-training model). By adopting the pre-training model, the sequencing model can learn based on a better initial state, faster convergence speed is realized, and better performance can be achieved.
In this embodiment, since the historical documents are already distributed, the historical department information corresponding to the historical documents, such as department names, department interface persons, and the like, may be determined based on the historical distribution records. For example, the historical department information corresponding to the historical official document may be determined manually and used as a part of the keywords, or the keywords may be extracted from the content of the historical official document by using a keyword extraction model, such as TextRank. These keywords can reflect domain information of the content of the historical official document.
And S220, splicing the topic keywords and the historical department information, and taking the spliced key information as a part of training samples.
In this embodiment, the splicing of the topic keyword and the historical department information means that the topic keyword and the historical department information are combined, and the key information after the combination can include a keyword capable of reflecting the domain to which the official document belongs and an official document department corresponding to the keyword, that is, the key information after the combination is equivalent to the marking of the key information related to the official document domain on the content of the historical official document, and the marked content of the historical official document can be used as a part of a training sample of the model for training the model.
Specifically, fig. 2b is a schematic diagram of a training method of a ranking model according to the second embodiment of the present application. As shown in fig. 2b, the topic keywords such as "personnel establishment, responsibility exploration, information system and system specification" can be extracted from the content of the historical official document, and by splicing the topic keywords with the department information distributed to the historical official document, the spliced key information can be used as part of the training sample for training the ranking model.
And S230, determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document.
The distribution opinion information of the historical official document refers to prompt information or suggestion information given by manual operation on distribution of the official document, wherein the prompt information or suggestion information comprises a department name, a department interface person and the like to which the official document is distributed.
In this embodiment, the distribution opinion information of the historical official document can be obtained through a distribution prompting component in the official document processing system. Fig. 2c is a diagram illustrating an effect of a display interface of an official document suggestion distribution component according to a second embodiment of the present application. As shown in fig. 2c, for the document to be distributed, by importing the document into the document distribution prompting component, the key information corresponding to the document, such as the name of the department to be delivered, the department interface person, etc., can be displayed on the display interface of the component. If the 'adoption key' for adopting the key information on the display interface is triggered, the key information of the official document file is displayed in the 'batch opinion' column on the display interface. The related staff can correct the distributed opinions in the "batch opinion" column ". In addition, on the display interface, the staff can also search other related departments corresponding to a certain department by triggering the query key. Through the official document distribution opinion prompting component shown in fig. 2c, preliminary distribution opinion information for the official document by the official business processing system can be obtained. And correcting the preliminary distribution opinion information in a manual mode to obtain accurate distribution opinion information corresponding to the official document, wherein the distribution opinion information comprises target department information corresponding to the official document, and the target department information can be used as a training sample of the model.
Further, for the manually corrected distribution opinion information of the historical official document, a modification example sentence library generated based on the manual modification suggestion and a phrase error correction white list generated based on the manual modification log of the official document content can be used.
Specifically, fig. 2d is a schematic diagram of generating a phrase error correction white list according to the second embodiment of the present application. As shown in fig. 2d, for the distribution opinion information that is directly modified and adopted manually, a modification example sentence library is generated based on the manual modification trace, and the modification example sentence library includes contents such as the field department information after manually modifying the content of the official document. As shown in fig. 2d, for the distribution opinion information that is directly modified by human and is not adopted, the similarity of the phrases before and after modification can be determined according to the modification record, that is, the review draft behavior log, and the phrases with the similarity greater than the set similarity threshold are all used as the phrases in the white list, so as to obtain the phrase error correction white list.
And S240, generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information.
The positive sample represents that the historical official document and the historical target department information are in correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in wrong corresponding relation.
Specifically, if the correct correspondence between the historical document and the department to which it is to be delivered is: official document 1 corresponds with reading department door A, and official document 2 corresponds with reading department door B, and official document 3 corresponds with reading department door C, so, as shown in FIG. 2B, including the official document 1 that the mark has reading department door A, the official document 2 that the mark has reading department door B in the positive sample to and the mark has reading department door C's official document 3. The negative examples include document 1 marked with reading department gate B, document 2 marked with reading department gate C, and document 3 marked with reading department gate A.
And S250, training the initial ranking model based on part of the training samples, the positive samples and the negative samples to obtain a trained ranking model.
The training process of the ranking model comprises the following steps: after the training samples of each batch are sent into the model, a predicted value is output through forward propagation, and then a difference value between the predicted value and a true value, namely a loss value, is calculated through a loss function. After the loss value is obtained, the model updates each parameter through back propagation to reduce the loss between the true value and the predicted value, so that the predicted value generated by the model is close to the true value until the value of the loss function is converged, and the model training is completed.
Specifically, as shown in fig. 2b, in the training process, the output of the ranking model is the percentage of the target department information corresponding to different documents, that is, the weight value, and by adopting the training process, when the value of the loss function reaches convergence, the percentage of the target department information corresponding to each document reaches the set threshold, and at this time, the training of the ranking model is completed.
Furthermore, with the continuous updating of the training samples, the trained sequencing model can be trained again at regular intervals, so that the performance of the sequencing model can be effectively improved.
In this embodiment, the ranking model is trained by using the content of the historical documents and the manually corrected distribution opinion information of the historical documents, so that the trained ranking model can establish an association relationship between the key information of the document content and the department information corresponding to the field to which the document content belongs. In the process of distributing the official document, the RPA robot can quickly and accurately obtain the target department information of the official document to be distributed based on the trained sequencing model, and compared with the mode of manually determining the target department of the official document in the related technology, the RPA robot saves the time of workers and effectively improves the distribution efficiency of the official document.
The specific application of the ranking model in the document file distribution process is described in detail below.
EXAMPLE III
Fig. 3 is a flowchart of an RPA and AI-based document distribution method according to a third embodiment of the present application, where in this embodiment, a process of determining key information of document contents is refined, and "target department information determined for receiving a document according to the key information" is refined to "the key information is used as an input of a trained ranking model, and document department information with a largest weight value is selected as target department information corresponding to a document from an output of the trained ranking model. As shown in fig. 3, the method includes:
and S310, calling an Optical Character Recognition (OCR) component to recognize the document file to obtain document content.
And S320, determining paragraph titles in the official document content.
The official document file usually adopts a general official document format, for example, the text part may have corresponding titles, including a primary title, a secondary title, etc., and different titles have corresponding title identifiers. In this embodiment, the RPA robot may determine the paragraph titles in the document content based on the title identifications.
S330, extracting key sentences from the content of each paragraph belonging to the same paragraph title.
And the verbs in the key sentences meet the requirements of the preset official document corpus. For example, the key sentence may be: starting from a verb that meets the requirements of the corpus of documents and ending at the full stop of the sentence in which the verb is located. Specifically, if the content of the document is 'call the units to integrally build the supervision working platform from the time of the day on the market'. In this sentence, if the overall construction is a verb that meets the requirements of the official document corpus, the key sentence is the overall construction supervision working platform.
S340, extracting the domain key words of the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
The preset official document field keyword list can be created in the following mode:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity larger than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary. Wherein the second set threshold is less than the first set threshold.
In this embodiment, the creation of the preset domain keyword vocabulary combines with domain keyword information obtained from various channels, so that the preset domain keyword vocabulary can contain domain keywords which are more comprehensive and related to the domain to which the document content belongs. By presetting the domain keyword vocabulary, the determination of the key information of the document content to be distributed can be more accurate, so that the sequencing model can predict more accurately based on the key information, and more accurate target department information related to the document content can be obtained.
Specifically, the preset domain keyword vocabulary includes information related to the domain to which the document content belongs, such as "supervision, spot check, construction, manpower, law, audit, planning, finance, management and management". Specifically, if the key sentence is the "overall construction supervision work platform", the field keyword extracted from the key sentence is "supervision" based on the preset field keyword vocabulary, and the department corresponding to the keyword is the "inspection and supervision department".
And S350, taking the key information as the input of the trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model.
The training process of the ranking model may refer to the description of the above embodiments, and will not be described herein again.
And S360, distributing the official document files according to the corresponding target department information.
In this embodiment, the RPA robot may send the document file to be distributed to the corresponding target department by triggering the "distribution" button on the document processing system operation interface. Or the RPA robot can send the document file to a designated mailbox corresponding to the target by means of a mail.
According to the technical scheme provided by the implementation, keywords of the fields to which the documents belong in the preset document corpus are mined based on an automatic phrase mining method, and the keywords of the fields in the documents are manually marked by capturing the keywords in the document web pages. Based on the domain keyword information obtained from various channels, a domain keyword vocabulary can be created, so that the preset keyword vocabulary can contain domain keywords which are more comprehensive and related to the domain to which the document content belongs. By presetting the domain keyword vocabulary, the determination of the key information of the official document content to be distributed can be more accurate, so that the sequencing model can be more accurately predicted based on the key information, and more accurate target department information related to the official document content can be obtained.
Example four
Fig. 4 is a block diagram of a structure of an RPA and AI-based official document distribution device according to a fourth embodiment of the present application, and as shown in fig. 4, the device includes: a key information determination module 410, a target department information determination module 420, and a document file distribution module 430, wherein,
a key information determining module 410, configured to identify the content of the document file, to obtain key information, where the key information includes information of a field to which the document file belongs;
a target department information determination module 420 configured to determine target department information for receiving the document file according to the key information;
and the document file distribution module 430 is configured to distribute document files according to the corresponding target department information.
Optionally, the key information determining module 410 includes:
the document content recognition unit is configured to call an Optical Character Recognition (OCR) component to recognize the document file to obtain document content;
and the key information determining unit is configured to extract the domain key words of the domain to which the official document belongs from the official document contents as key information according to a preset official document domain key word list.
Optionally, the key information determining unit is specifically configured to:
determining paragraph titles in the official document content;
extracting key sentences from the content of each paragraph belonging to the same paragraph title, wherein verbs in the key sentences meet the requirement of a preset official document corpus;
and extracting the domain key words of the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
Optionally, the preset official document field keyword list is created in the following manner:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity larger than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary.
Optionally, the target department information determining module 420 is specifically configured to:
taking the key information as the input of the trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model;
the trained sequencing model establishes the incidence relation between the key information of the official document content and the corresponding department information of the affiliated field.
Optionally, the ranking model is obtained by training in the following way:
extracting topic keywords from the content of the historical official document, and determining historical department information corresponding to the historical official document;
splicing the topic keywords and the historical department information, and using the spliced key information as part of training samples;
determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document;
generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information, wherein the positive sample represents that the historical official document and the historical target department information are in a correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in an incorrect corresponding relation;
and training the initial sequencing model based on part of the training samples, the positive samples and the negative samples to obtain a trained sequencing model.
Optionally, the manually corrected comment information for distributing the historical official document further includes:
the system comprises a modification example sentence library generated based on manual modification suggestions and a phrase error correction white list generated based on manual modification logs of official document contents.
The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.
EXAMPLE five
Fig. 5 is a block diagram of a device for distributing official document files according to a fifth embodiment of the present application. As shown in fig. 5, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the document file distribution method based on RPA and AI in the above embodiments when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.
The apparatus further comprises:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
Embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, the computer program implements the method provided in the embodiments of the present application.
The embodiment of the present application further provides a chip, where the chip includes a processor, and is configured to call and execute the instruction stored in the memory from the memory, so that the communication device in which the chip is installed executes the method provided in the embodiment of the present application.
An embodiment of the present application further provides a chip, including: the system comprises an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the embodiment of the application.
It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be an advanced reduced instruction set machine (ARM) architecture supported processor.
Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present application, and these should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (15)
1. A document file distribution method based on Robot Process Automation (RPA) and Artificial Intelligence (AI) is applied to an RPA robot and is characterized by comprising the following steps:
s1, identifying the content of the official document file to obtain key information, wherein the key information comprises information of the field to which the official document file belongs;
s2, determining target department information for receiving the official document file according to the key information;
and S3, distributing the official document file according to the corresponding target department information.
2. The method according to claim 1, wherein the step S1 specifically includes:
s11, calling an Optical Character Recognition (OCR) component to recognize the official document to obtain the official document content;
and S12, extracting the domain keywords of the domain to which the official document belongs from the official document content according to a preset official document domain keyword table to serve as key information.
3. The method according to claim 2, wherein the step S12 specifically includes:
s121, determining paragraph titles in the official document content;
s122, extracting key sentences from the content of each paragraph belonging to the same paragraph title, wherein verbs in the key sentences meet the requirement of a preset official document corpus;
and S123, extracting the domain key words of the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
4. The method of claim 3, wherein the predetermined official document domain keyword list is created by:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity higher than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary.
5. The method according to any one of claims 1 to 4, wherein the step S2 specifically includes:
s21, taking the key information as the input of a trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model;
and establishing an incidence relation between key information of the official document content and corresponding department information in the field of the official document by the trained sequencing model.
6. The method of claim 5, wherein the ranking model is trained by:
extracting topic keywords from the content of the historical official document, and determining historical department information corresponding to the historical official document;
splicing the topic keywords and the historical department information, and taking the spliced key information as a part of training samples;
determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document;
generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information, wherein the positive sample represents that the historical official document and the historical target department information are in a correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in an incorrect corresponding relation;
and training an initial ranking model based on the part of training samples, the positive samples and the negative samples to obtain the trained ranking model.
7. The method of claim 6, wherein distributing opinion information further comprises:
the system comprises a modification example sentence library generated based on manual modification suggestions and a phrase error correction white list generated based on manual modification logs of official document contents.
8. An apparatus for distributing a document file based on RPA and AI, comprising:
the key information determining module is configured to identify the content of the official document file to obtain key information, wherein the key information comprises the information of the field to which the official document file belongs;
a target department information determining module configured to determine target department information for receiving the official document according to the key information;
and the document file distribution module is configured to distribute the document files according to the corresponding target department information.
9. The apparatus of claim 8, wherein the key information determining module comprises:
the document content recognition unit is configured to call an Optical Character Recognition (OCR) component to recognize the document file to obtain document content;
and the key information determining unit is configured to extract the domain key words of the domain to which the official document belongs from the official document contents as key information according to a preset official document domain key word list.
10. The apparatus according to claim 9, wherein the key information determining unit is specifically configured to:
determining paragraph titles in the official document content;
extracting key sentences from the content of each paragraph belonging to the same paragraph title, wherein verbs in the key sentences meet the requirements of a preset official document corpus;
and extracting the domain key words of the domain to which the official document belongs from the key sentences as key information according to a preset official document domain key word list.
11. The apparatus of claim 10, wherein the predetermined official document domain keyword table is created by:
based on an automatic phrase mining method AutoPhrase in Natural Language Processing (NLP) service, mining keywords in the field to which documents belong in a preset document corpus to obtain candidate field keywords with similarity greater than a first set threshold;
capturing keywords in the official document webpage, and labeling the domain keywords in the official document webpage;
and screening out keywords with similarity higher than a second set threshold value with the labeling result from the candidate domain keywords based on the labeling result so as to form a domain keyword vocabulary.
12. The apparatus according to any of claims 8-10, wherein the target department information determination module is specifically configured to:
taking the key information as the input of a trained sorting model, and selecting the official document department information with the largest weight value as the target department information corresponding to the official document from the output of the trained sorting model;
and establishing an incidence relation between key information of the official document content and corresponding department information in the field of the official document by the trained sequencing model.
13. The apparatus of claim 12, wherein the ranking model is trained by:
extracting topic keywords from the content of the historical official document, and determining historical department information corresponding to the historical official document;
splicing the topic keywords and the historical department information, and taking the spliced key information as a part of training samples;
determining historical target department information corresponding to the historical official document from the manually corrected distribution opinion information of the historical official document;
generating a positive sample and a negative sample according to the corresponding relation between the historical official document and the historical target department information, wherein the positive sample represents that the historical official document and the historical target department information are in a correct corresponding relation, and the negative sample represents that the historical official document and the historical target department information are in an incorrect corresponding relation;
and training an initial ranking model based on the part of training samples, the positive samples and the negative samples to obtain the trained ranking model.
14. An apparatus for distribution of official document files, comprising: a processor and a memory, the memory having stored therein instructions that are loaded and executed by the processor to implement the method of any of claims 1 to 7.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532926.1A CN114219438A (en) | 2021-12-15 | 2021-12-15 | Document file distribution method, device, equipment and medium based on RPA and AI |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532926.1A CN114219438A (en) | 2021-12-15 | 2021-12-15 | Document file distribution method, device, equipment and medium based on RPA and AI |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114219438A true CN114219438A (en) | 2022-03-22 |
Family
ID=80702217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111532926.1A Pending CN114219438A (en) | 2021-12-15 | 2021-12-15 | Document file distribution method, device, equipment and medium based on RPA and AI |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114219438A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115063102A (en) * | 2022-06-13 | 2022-09-16 | 北京新机场建设指挥部 | Project progress management method, device and medium based on information coupling |
CN116704522A (en) * | 2023-08-02 | 2023-09-05 | 京华信息科技股份有限公司 | Method and system for assisting document classification |
-
2021
- 2021-12-15 CN CN202111532926.1A patent/CN114219438A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115063102A (en) * | 2022-06-13 | 2022-09-16 | 北京新机场建设指挥部 | Project progress management method, device and medium based on information coupling |
CN116704522A (en) * | 2023-08-02 | 2023-09-05 | 京华信息科技股份有限公司 | Method and system for assisting document classification |
CN116704522B (en) * | 2023-08-02 | 2023-11-24 | 京华信息科技股份有限公司 | Method and system for assisting document classification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109726293B (en) | Causal event map construction method, system, device and storage medium | |
CN112631997B (en) | Data processing method, device, terminal and storage medium | |
DE69925831T2 (en) | MACHINE ASSISTED TRANSLATION TOOLS | |
US10650094B2 (en) | Predicting style breaches within textual content | |
EP1217533A2 (en) | Method and computer system for part-of-speech tagging of incomplete sentences | |
US20020083103A1 (en) | Machine editing system incorporating dynamic rules database | |
CN112149399A (en) | Table information extraction method, device, equipment and medium based on RPA and AI | |
US11657232B2 (en) | Source code compiler using natural language input | |
CN112163424A (en) | Data labeling method, device, equipment and medium | |
CN114219438A (en) | Document file distribution method, device, equipment and medium based on RPA and AI | |
CN113076133B (en) | Deep learning-based Java program internal annotation generation method and system | |
US20110202545A1 (en) | Information extraction device and information extraction system | |
US11657151B2 (en) | System and method for detecting source code anomalies | |
CN112258144B (en) | Policy file information matching and pushing method based on automatic construction of target entity set | |
US20220414463A1 (en) | Automated troubleshooter | |
US20020103837A1 (en) | Method for handling requests for information in a natural language understanding system | |
CN112257442B (en) | Policy document information extraction method based on corpus expansion neural network | |
Ciravegna et al. | LearningPinocchio: Adaptive information extraction for real world applications | |
JP7064680B1 (en) | Program code automatic generation system | |
CN112733517B (en) | Method for checking requirement template conformity, electronic equipment and storage medium | |
CN113434631A (en) | Emotion analysis method and device based on event, computer equipment and storage medium | |
CN112988704A (en) | AI consultation database cluster building method and system | |
CN116560631A (en) | Method and device for generating machine learning model code | |
US20020129066A1 (en) | Computer implemented method for reformatting logically complex clauses in an electronic text-based document | |
CN114493360A (en) | Process creative evaluation method, device, equipment and medium based on RPA and AI |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |