WO2023045056A1 - 基于rpa和ai的文件比对方法、装置、设备、介质及系统 - Google Patents

基于rpa和ai的文件比对方法、装置、设备、介质及系统 Download PDF

Info

Publication number
WO2023045056A1
WO2023045056A1 PCT/CN2021/131818 CN2021131818W WO2023045056A1 WO 2023045056 A1 WO2023045056 A1 WO 2023045056A1 CN 2021131818 W CN2021131818 W CN 2021131818W WO 2023045056 A1 WO2023045056 A1 WO 2023045056A1
Authority
WO
WIPO (PCT)
Prior art keywords
comparison
file
text
page
client
Prior art date
Application number
PCT/CN2021/131818
Other languages
English (en)
French (fr)
Inventor
张金明
汪冠春
胡一川
褚瑞
李玮
Original Assignee
北京来也网络科技有限公司
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京来也网络科技有限公司, 来也科技(北京)有限公司 filed Critical 北京来也网络科技有限公司
Publication of WO2023045056A1 publication Critical patent/WO2023045056A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the embodiments of the present invention relate to the technical field of process automation, and in particular, relate to a method, device, device, medium and system for comparing files based on RPA and AI.
  • RPA Robot Process Automation, robotic process automation
  • Robot Process Automation is to simulate the operation of human beings on the computer through specific "robot software", and automatically execute process tasks according to the rules.
  • AI Artificial Intelligence
  • RPA has unique advantages: low-code, non-intrusive.
  • Low-code means that RPA can be operated without a high IT level, and business personnel who do not understand programming can also develop processes; non-intrusive means that RPA can simulate human operations without opening interfaces with software systems.
  • traditional RPA has certain limitations: it can only be based on fixed rules, and its application scenarios are limited. With the continuous development of AI technology, the deep integration of RPA and AI overcomes the limitations of traditional RPA.
  • RPA+AI Hand work+Head work is greatly changing the value of labor.
  • Embodiments of the present invention provide a file comparison method, device, device, medium, and system based on RPA and AI, which can automatically realize file comparison work, thereby not only saving manpower, but also improving the efficiency of file comparison.
  • the implementation of the present invention provides a method for comparing files based on RPA and AI, the method is applied to a server, and the method includes:
  • the S4 includes:
  • first comparison sub-results and second comparison sub-results For non-adjacent first comparison sub-results and second comparison sub-results, if both the first comparison sub-results and the second comparison sub-results have the same content, and the first comparison sub-results and If the comparison sub-results between the second comparison sub-results include content deletion and content addition, but do not include the same content, then the comparison sub-results between the first comparison sub-results and the second comparison sub-results The comparison sub-results are merged into one comparison sub-result, and the merged comparison sub-result is content modification.
  • the S5 includes:
  • difference comparison result includes content increase, add a difference mark to the text content in the comparison file relative to the reference file;
  • difference comparison result includes content modification, respectively add a difference mark to the text content before modification in the reference file and the text content after modification in the comparison file.
  • the S2 includes:
  • the reference document and the comparison document are identified by optical character recognition (OCR), and at least one page of text of the reference document and at least one page of text of the comparison document are obtained.
  • OCR optical character recognition
  • the method further includes:
  • the file formats of the reference file and the comparison file are converted into a format recognizable by optical character recognition (OCR);
  • the method also includes:
  • the method also includes:
  • the embodiment of the present invention also provides a method for comparing files based on RPA and AI, the method is applied to an RPA robot, and the method includes:
  • the method also includes:
  • S15 Trigger the client to send a comparison result query instruction to the server, so that the server feeds back reference files and/or comparison files containing difference marks to the client.
  • S16 Trigger the client to send a comparison task status query instruction to the server, so that the server feeds back the task status of the comparison task to the client.
  • the embodiment of the present invention provides a device for comparing files based on RPA and AI, the device is applied to a server, and the device includes:
  • the file obtaining unit is used to obtain the reference file and the comparison file sent by the client, the reference file and/or the comparison file are multi-page files, and the reference file and the comparison file are triggered by the RPA robot The file sent by the client;
  • a text acquisition unit configured to acquire at least one page of text of the reference document and at least one page of text of the comparison document
  • a splicing unit configured to splice at least one page of text of the reference document into one page of text with continuous context to obtain a reference text, and splice at least one page of text of the comparison file into one page of text with continuous context to obtain compare text;
  • a comparison unit configured to use a preset comparison algorithm to perform a differential comparison between the reference text and the comparison text, and obtain a differential comparison result of the comparison text relative to the reference text;
  • a marking unit configured to differentially mark the reference file and/or the comparison file according to the differential comparison result.
  • the comparison unit includes:
  • a comparison module configured to compare the reference text and the comparison text according to a preset comparison unit, and obtain comparison sub-results for each preset comparison unit;
  • the merging module is used for non-adjacent first comparison sub-results and second comparison sub-results, if both the first comparison sub-results and the second comparison sub-results have the same content, and the first comparison sub-results If the comparison sub-result between the sub-result and the second comparison sub-result includes content deletion and content addition, but does not include the same content, then the first comparison sub-result and the second comparison sub-result The comparison sub-results between are merged into one comparison sub-result, and the merged comparison sub-result is content modification.
  • the marking unit includes:
  • the first mark adding module is used to add a difference mark to the text content deleted by the comparison file in the reference file if the difference comparison result includes content deletion;
  • the second mark adding module is used to add a difference mark to the text content increased in the comparison file relative to the reference file if the difference comparison result includes content increase;
  • the third mark adding module is used to add a difference mark to the text content before modification in the reference file and the text content after modification in the comparison file if the difference comparison result includes content modification .
  • the text acquisition unit is configured to use optical character recognition (OCR) to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of text of the comparison document .
  • OCR optical character recognition
  • the device also includes:
  • An authentication unit configured to authenticate the user information of the client before acquiring at least one page of text of the reference file and at least one page of text of the comparison file;
  • a format conversion unit configured to convert the file format of the reference file and the comparison file into a format recognizable by optical character recognition (OCR) if the authentication is passed;
  • OCR optical character recognition
  • a task adding unit configured to add the comparison task for the reference file and the comparison file to the comparison task queue, and record the comparison task and the task status of the comparison task into a task database middle;
  • a task acquiring unit configured to acquire the comparison task from the comparison task queue
  • An updating unit configured to update the task status of the comparison task in the task database when the task status of the comparison task changes.
  • the device also includes:
  • the feedback unit is used to feed back the reference file and/or comparison file containing the difference mark to the client when receiving the comparison result query instruction sent by the client triggered by the RPA robot; when receiving the When the RPA robot triggers the comparison task status query instruction sent by the client, it queries the task status of the comparison task corresponding to the comparison task status query instruction from the task database, and feeds back the queried task status to the client.
  • an embodiment of the present invention provides a device for comparing files based on RPA and AI, the device is applied to an RPA robot, and the device includes:
  • the login upload unit is used to log in the client, and upload the reference file and the comparison file to the client, the reference file and/or the comparison file are multi-page files;
  • a trigger sending unit configured to trigger the client to send the reference file and the comparison file to a server for difference comparison.
  • the trigger sending unit is further configured to trigger the client to send a comparison result query instruction to the server, so that the server feeds back reference files containing difference marks and/or comparing files; and/or triggering the client to send a comparison task status query instruction to the server, so that the server feeds back the task status of the comparison task to the client.
  • an embodiment of the present invention further provides a computing device, the computing device comprising:
  • processors one or more processors
  • the computing device is a server
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement the method as described in the first aspect
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement the method as described in the second aspect.
  • the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect or the second aspect is implemented.
  • the embodiment of the present invention also provides a file comparison system based on RPA and AI, and the system includes an RPA robot, a client and a server;
  • the RPA robot is used to log in the client, upload the reference file and the comparison file to the client, trigger the client to send the reference file and the comparison file to the server for difference
  • the reference document and/or the comparison document are multi-page documents
  • the server is configured to obtain the reference file and the comparison file sent by the client; obtain at least one page of text of the reference file and at least one page of text of the comparison file; Splicing at least one page of text of the document into a page of text with continuous context to obtain a reference text, and splicing at least one page of text of the comparison file into a page of text with continuous context to obtain a comparison text; using a preset comparison
  • the algorithm performs a differential comparison between the reference text and the comparison text, and obtains a differential comparison result of the comparison text relative to the reference text; according to the differential comparison result, the reference text
  • the files and/or the compared files are differentially marked.
  • the RPA and AI-based file comparison method, device, device, medium, and system can automatically log in to the client through the RPA robot and trigger the client to send reference files and comparison files to the server.
  • the server obtains After the reference file and the comparison file, at least one page of text can be obtained from the reference file and the comparison file, and at least one page of text in the reference file can be spliced into a page of text with continuous context (which can be called a reference text), Splicing at least one page of text in the comparison file into a page of text with continuous context (which can be called the comparison text), and then using the preset comparison algorithm to compare the differences between the reference text and the comparison text, and finally according to the difference
  • the comparison results mark the difference between the reference file and/or the comparison file.
  • the embodiment of the present invention can not only use the RPA robot to automatically trigger the client to send two files to be compared to the server
  • the server can also automatically mark the differences, which not only saves manpower, but also allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison.
  • the embodiment of the present invention adds the difference mark to the reference file and/or the comparison file, instead of describing the difference in or only in a third-party area independent of the reference file and the comparison file, the difference can be improved Markup readability.
  • the text content in the two files can be automatically recognized through OCR technology without manual acquisition, thereby improving the efficiency of file comparison.
  • Fig. 1 is the flowchart of a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention
  • Fig. 2 is the flow chart of another kind of file comparison method based on RPA and AI provided by the embodiment of the present invention
  • Fig. 3 is an example diagram of a file difference comparison result provided by an embodiment of the present invention.
  • FIG. 4 is a block diagram of a file comparison device based on RPA and AI provided by an embodiment of the present invention
  • FIG. 5 is a block diagram of another RPA- and AI-based file comparison device provided by an embodiment of the present invention.
  • Fig. 6 is a kind of file comparison system architecture diagram based on RPA and AI provided by the embodiment of the present invention.
  • Fig. 7 is an architecture diagram of another file comparison system based on RPA and AI provided by an embodiment of the present invention.
  • the RPA Robot Process Automation
  • AI Artificial Intelligence, artificial intelligence
  • the embodiment of the present invention provides a combination of RPA and AI technologies to automatically compare files, so as to not only save manpower, but also improve the efficiency of file comparison.
  • reference file refers to a file that is used as a reference when performing a difference comparison
  • comparison file refers to a file that is used as a reference in the two files that are compared.
  • version of the reference document is often lower than that of the comparison document.
  • the reference document and comparison document can be documents in any field, such as contract documents, financial documents, program documents, etc.
  • multi-page file refers to a file with text content greater than or equal to two pages.
  • OCR refers to Optical Character Recognition (Optical Character Recognition), specifically refers to the electronic equipment to check the characters printed on the paper, determine its shape by detecting dark and bright patterns, and then use the character Recognition method
  • Optical Character Recognition Optical Character Recognition
  • the text in the paper document can be converted into a black-and-white dot matrix image file using OCR technology based on the RPA robot, and then the server uses OCR technology to identify the text content contained in the image file from the image file, It is also possible to use OCR technology to obtain text content from paper documents based on the RPA robot, generate a text file (that is, an editable file) containing text content, and then directly extract the text content from the text file by the server.
  • OCR technology that is, an editable file
  • client refers to the front-end of the business system with file comparison requirements
  • server refers to the back-end of the business system with file comparison requirements
  • client can be the application software corresponding to the business system, or it can be a browser, so that the RPA robot can access the website of the business system through the browser.
  • RPA robot can be integrated in the client, can also be embedded in the client in the form of a plug-in, or can be independent of the client, as long as the RPA robot can automatically access the client. The specific form is not limited.
  • NLP refers to natural language processing (Natural Language Processing), which takes language as an object and uses computer technology to analyze, understand and process natural language, that is, a computer is used as A powerful tool for language research, it conducts quantitative research on language information with the support of computers, and provides language descriptions that can be used between humans and computers.
  • Natural Language Processing Natural Language Processing
  • splicing refers to connecting the text to be spliced together without changing the content of the text. That is, on the basis of retaining the arrangement order of the original text content, the text content of multiple pages can be seamlessly connected.
  • the term "preset comparison algorithm” refers to a specific comparison method for determining the difference between the comparison text and the reference text, and the reference text and the comparison text can be compared according to the preset comparison unit The comparison is performed in batches until the comparison is completed.
  • the term “preset comparison unit” refers to the size of the text to be compared each time, which may be determined according to the actual situation, and may be a phrase, a sentence, or a paragraph.
  • difference comparison refers to comparing the differences between the reference text and the comparison text.
  • difference comparison result refers to the difference comparison between the reference text and the comparison text, which contains multiple comparison sub-results, and each comparison sub-result includes the difference type and the difference content corresponding to the difference type, The types of differences include same content, content addition, content deletion and content modification.
  • difference mark is a mark that can highlight the specific differences between the reference file and the comparison file. The difference mark includes but is not limited to a combination of one or more of the following: bold font, change font color, increase font Background color, highlight font, increase font size, change to italics, add underline, etc.
  • authentication refers to verifying whether the client that sends the reference file and the comparison file has the authority to perform file comparison, specifically, it can be realized by verifying whether the user information of the client meets the authority requirement authentication.
  • Fig. 1 is a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention, this method is mainly applied to the server, specifically includes:
  • the reference document and/or the comparison document is a multi-page document, that is, at least one of the reference document and the comparison document is a multi-page document.
  • the reference file and the comparison file are files sent by the client triggered by the RPA robot, that is, the PRA robot first logs in to the client, and then triggers the client to send the reference file and the comparison file on the corresponding page.
  • the specific implementation manner of the RPA robot triggering the client to send the reference file and the comparison file to the server can refer to the following method embodiment with the RPA robot as the execution subject, and will not be repeated here.
  • the server can first authenticate the user information of the client to verify whether the user has the file comparison authority; if the authentication is passed, the reference
  • the file format of the file and the comparison file is converted into a format that can be recognized by OCR, the comparison task for the reference file and the comparison file is added to the comparison task queue, and the comparison task and the comparison task are added to the comparison task queue.
  • the task status of the comparison task is recorded in the task database, so that the comparison task can be obtained from the comparison task queue subsequently, and the comparison task is executed, so as to realize the comparison of the comparison tasks in the comparison task Compare the difference between the reference file and the comparison file; if the authentication fails, the server will not compare the difference between the reference file and the comparison file, and can feed back a reminder message that there is no comparison permission to the client.
  • the user information may be a client account, may be a mobile phone number bound to the client account, may also be a user level or other information, and the implementation of the present invention does not limit the specific content of the user information, which may be determined according to specific circumstances.
  • authenticate user information including but not limited to the following two: (1) Match the user information with the list of authorized users, and if the matching is successful, determine that the user corresponding to the user information has authorization , that is, the authentication passes. If the matching fails, it is determined that the user corresponding to the user information has no authority, that is, the authentication fails; (2) judge whether the user level in the user information exceeds the preset level, and if it exceeds the preset level, then The authentication is passed. If the level does not exceed the preset level, the authentication fails.
  • OCR-recognizable formats include but are not limited to image format, pdf format, etc.
  • the reference file and/or comparison file can be converted to an OCR-recognizable format , so that OCR can be used to identify the text content in the reference file and/or comparison file.
  • the docx format can be converted to pdf format.
  • the server may update the task state of the comparison task in the task database.
  • the task status can be unprocessed; when the comparison task is being executed, the task status can be processing; when the comparison task is completed (that is, the difference has been marked), the task status Can be completed.
  • the file format of the reference file and the comparison file can be converted into a format that can be recognized by OCR, so this step can use OCR to identify the reference file and the comparison file to obtain the reference file at least one page of text and at least one page of text of the comparison document.
  • the reference document includes two pages of text, and the comparison document adds a page of text between the first page of text and the second page of text in the reference document, thereby forming three pages of text.
  • the single-page comparison method is used to compare the two
  • the comparison result is that the text on the second page of the reference file is different from the text on the second page of the comparison file, and the reference file does not have the text on the third page, so the comparison result is the text on the third page of the comparison file. It does not exist in the reference documents, that is to say, the single-page comparison method will lead to the overall comparison result that the two documents are not the same except for the same text on the first page.
  • the embodiment of the present invention splices at least one page of text of the reference file into one page of text with continuous context before performing a differential comparison between the reference file and the comparison file, and obtains the reference Text, and splicing at least one page of text in the comparison file into one page of text with continuous context to obtain the comparison text, and then compare the differences between the reference text and the comparison text, that is, the text content in the reference file and the comparison text
  • the text content in the file is compared as a whole, so that the problem of inaccurate comparison results caused by not considering the relationship between each page can be avoided.
  • the continuous context refers to maintaining the sequence of the original text.
  • the specific method of splicing at least one page of text of the reference document or comparison document into a page of text with continuous context may be to sequentially splice at least one page of text according to the page order of the reference document or comparison document, so as to obtain a context-continuous One page of text.
  • splicing at least one page of text of the reference file into one page of text with continuous context to obtain the reference text includes: when the reference file is a file containing multiple pages of text, combining multiple pages of the reference file The text is spliced into one page of text with continuous context to obtain a reference text; when the reference file is a file containing a single page of text, a single page of text is obtained from the reference file as the reference text.
  • Stitching at least one page of text of the comparison file into a page of text with continuous context to obtain the comparison text includes: when the comparison file is a file containing multiple pages of text, multiple pages of the comparison file The pages of text are spliced into one page of text with continuous context to obtain the comparison text; when the comparison file is a file containing a single page of text, a single page of text is obtained from the comparison file as the comparison text.
  • the reference text and the comparison text may be compared according to a preset comparison unit, and comparison sub-results for each preset comparison unit are obtained.
  • the comparison subtext the default comparison If it is determined that the reference sub-text being compared does not exist in the comparison text, then the corresponding comparison sub-result is determined as the content Delete; if it is determined that the comparison sub-text being compared does not exist in the reference text, then determine the corresponding comparison sub-result as content addition.
  • the differences between two texts should include not only the same content, content deletion and content addition, but also content modification. Therefore, in order to allow users to see the difference between the comparison text and the reference text more intuitively, for the non-adjacent first comparison sub-result and the second comparison sub-result, if the first comparison sub-result and the second comparison sub-result Both of the two comparison sub-results have the same content, and the comparison sub-results between the first comparison sub-result and the second comparison sub-result include content deletion and content addition, but do not include the same content, then the The comparison sub-results between the first comparison sub-result and the second comparison sub-result are merged into one comparison sub-result, and the combined comparison sub-result is content modification.
  • the size of the preset comparison unit can be determined according to the actual situation, and can be a phrase, a sentence, a paragraph, and
  • NLP technology can also be used to perform semantic analysis on the reference subtext and the comparison subtext.
  • the embodiment of the present invention can also support self-defined filtering rules, ignoring meaningless differences, that is, when there is a difference that satisfies the preset filtering rules in the difference between the reference subtext and the comparison subtext, ignore the difference that satisfies the preset filtering rules.
  • Set the difference in filtering rules For example, it can be set that the presence or absence of the particle "of" in a sentence does not affect the comparison result.
  • S140 Perform differential marking on the reference file and/or the comparison file according to the differential comparison result.
  • the difference comparison result includes content deletion, then add a difference mark to the text content deleted by the comparison file in the reference file; if the difference comparison result includes content increase, then Add a difference mark to the text content added in the comparison file relative to the reference file; if the difference comparison result includes content modification, then add a difference mark to the text content before modification in the reference file and the Compare the modified text content in the file and add difference marks respectively.
  • Differential marks include but are not limited to a combination of one or more of the following: bold font, change font color, increase font background color, highlight font, increase font, change to italics, add underline, etc.
  • the expression forms of the differential markers corresponding to different comparison sub-results may be the same or different.
  • the position information of the reference file may be generated according to the position information of the text content of the reference text in the reference file Configuration file
  • the comparison can be generated according to the position information of the text content of the comparison text in the comparison file
  • the location information configuration file of the file so that when the reference file and/or the comparison file are differentially marked according to the difference comparison result, the difference comparison result and location information can be used configuration file, performing differential marking on the reference file and/or the comparison file.
  • the location information includes the page number of the text content in the corresponding file, the row and column of the page corresponding to the page number
  • the location information configuration file includes the text content and the location information of the text content.
  • the difference comparison result includes content deletion, then according to the location information configuration file of the reference file, determine the location information in the reference file of the text content deleted by the comparison file in the reference file , and add a difference mark at the determined position information; if the difference comparison result includes content increase, then according to the position information configuration file of the comparison file, it is determined that the increase in the comparison file relative to the reference file The position information of the text content in the comparison file, and add a difference mark at the determined position information; if the difference comparison result includes content modification, according to the position information configuration file of the reference file, determine The position information of the text content before modification in the reference file in the reference file, according to the position information configuration file of the comparison file, determine the position of the text content in the comparison file after modification in the comparison file Position information, and add difference marks at the position information determined for the reference file and the comparison file respectively.
  • the server when the server receives the RPA robot triggering the comparison result query instruction sent by the client, it will feed back the reference file and/or comparison file containing the difference mark to the client, and can also send the difference The comparison results are fed back to the client, so that users can not only visually see the differences in the reference file and/or the comparison file, but also see the differences summarized separately.
  • the server receives the comparison task status query instruction sent by the client triggered by the RPA robot, it queries the task status of the comparison task corresponding to the comparison task status query instruction from the task database, and queries the The task status is fed back to the client.
  • the RPA and AI-based file comparison method can automatically log in to the client through the RPA robot and trigger the client to send the reference file and the comparison file to the server.
  • the server obtains the reference file and the comparison file
  • At least one page of text can be obtained from the reference document and the comparison document first, and at least one page of text in the reference document can be spliced into one page of text with continuous context (which can be called reference text), and at least one page of the comparison document can be The text is spliced into a page of text with continuous context (which can be called the comparison text), and then the reference text and the comparison text are compared using the preset comparison algorithm, and finally the reference file and/or Or compare files for difference marks.
  • the embodiment of the present invention can not only use the RPA robot to automatically trigger the client to send two files to be compared to the server
  • the server can also automatically mark the differences, which can not only save manpower, but also allow people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison.
  • the embodiment of the present invention adds the difference mark to the reference file and/or the comparison file, instead of describing the difference in or only in a third-party area independent of the reference file and the comparison file, the difference can be improved Markup readability.
  • the text content in the two files can be automatically recognized through OCR technology without manual acquisition, thereby improving the efficiency of file comparison.
  • another embodiment of the present invention also provides a kind of file comparison method based on RPA and AI, described method is applied to RPA robot, as shown in Figure 2, described method comprises:
  • the reference document and/or the comparison document is a multi-page document.
  • the RPA program can be configured in the electronic device that can log in to the client (it can be integrated or embedded in the client, or it can be independent of the client), so that the electronic device can simulate the user's mouse and keyboard according to the rules set in the RPA program
  • the operation automatically logs in to the client, and triggers the client to generate a file comparison request including the reference file and the comparison file by accessing the client, and sends the file comparison request to the server so that the server can compare the differences between the reference file and the comparison file right.
  • the client when logging in to the client, the client can pop up a login interface containing a verification code image.
  • the RPA robot can perform OCR recognition on the verification code image, obtain the verification code content in the verification code image, and store the verification code content Enter it into the corresponding edit box to successfully log in to the client.
  • S210 Trigger the client to send the reference file and the comparison file to a server for difference comparison.
  • the reference file and comparison file can be stored in the client, or in other storage space of the electronic device, or can be a paper file.
  • the RPA robot can search for reference files and comparison files from the other storage spaces, and upload the reference files and comparison files to the client, for example, by clicking the upload button.
  • the two files are uploaded to the client, and the two files can also be dragged to the designated area by dragging and dropping to realize the file upload, or other upload methods can be used.
  • the RPA robot can use OCR technology to convert the paper file into an image file or a text file (that is, an editable file composed of the text content in the paper file) , and then use the above method to upload to the client.
  • the RPA robot can also trigger the client to send a comparison result query instruction to the server, so that the server can feed back reference files and/or comparison files containing difference marks to the client, and output and display Reference files and/or comparison files containing difference marks; and/or, the client may also be triggered to send a comparison task status query instruction to the server, so that the server queries the comparison task from the task database The task status of the comparison task corresponding to the status query instruction is fed back to the client.
  • the specific implementation of the RPA robot triggering the client to send a comparison result query command or a comparison task status query command includes, but is not limited to, the RPA robot triggers the client by clicking the comparison result query button or the comparison task status query button on the client. The terminal generates and sends the corresponding instructions.
  • Fig. 3 is part of the text content of the reference file and the comparison file, and the difference can be directly displayed in the reference file and the comparison file (the bold and italic text refers to the text added in the comparison file, underlined The text refers to the deleted text in the comparison file, and the bold and enlarged text refers to the modified text), and also shows the difference summary (see the right part of the figure).
  • another embodiment of the present invention also provides a file comparison device based on RPA and AI, the device is applied to the server, as shown in Figure 4, the device includes:
  • the file obtaining unit 30 is configured to obtain the reference file and the comparison file sent by the client, the reference file and/or the comparison file are multi-page files, and the reference file and the comparison file are triggered by an RPA robot the file sent by the client;
  • a text acquisition unit 32 configured to acquire at least one page of text of the reference document and at least one page of text of the comparison document;
  • a splicing unit 34 configured to splice at least one page of text of the reference document into one page of text with continuous context, obtain a reference text, and splice at least one page of text of the comparison file into one page of text with continuous context, Get the comparison text;
  • a comparison unit 36 configured to perform a differential comparison between the reference text and the comparison text using a preset comparison algorithm, and obtain a differential comparison result of the comparison text relative to the reference text;
  • the marking unit 38 is configured to perform differential marking on the reference file and/or the comparison file according to the differential comparison result.
  • the comparison unit 36 includes:
  • a comparison module configured to compare the reference text and the comparison text according to a preset comparison unit, and obtain comparison sub-results for each preset comparison unit;
  • the merging module is used for non-adjacent first comparison sub-results and second comparison sub-results, if both the first comparison sub-results and the second comparison sub-results have the same content, and the first comparison sub-results If the comparison sub-result between the sub-result and the second comparison sub-result includes content deletion and content addition, but does not include the same content, then the first comparison sub-result and the second comparison sub-result The comparison sub-results between are merged into one comparison sub-result, and the merged comparison sub-result is content modification.
  • the marking unit 38 includes:
  • the first mark adding module is used to add a difference mark to the text content deleted by the comparison file in the reference file if the difference comparison result includes content deletion;
  • the second mark adding module is used to add a difference mark to the text content increased in the comparison file relative to the reference file if the difference comparison result includes content increase;
  • the third mark adding module is used to add a difference mark to the text content before modification in the reference file and the text content after modification in the comparison file if the difference comparison result includes content modification .
  • the text acquisition unit 32 is configured to use optical character recognition (OCR) to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of the comparison document text.
  • OCR optical character recognition
  • the device also includes:
  • An authentication unit configured to authenticate the user information of the client before acquiring at least one page of text of the reference file and at least one page of text of the comparison file;
  • a format conversion unit configured to convert the file format of the reference file and the comparison file into a format recognizable by optical character recognition (OCR) if the authentication is passed;
  • OCR optical character recognition
  • a task adding unit configured to add the comparison task for the reference file and the comparison file to the comparison task queue, and record the comparison task and the task status of the comparison task into a task database middle;
  • a task acquiring unit configured to acquire the comparison task from the comparison task queue
  • An updating unit configured to update the task status of the comparison task in the task database when the task status of the comparison task changes.
  • the device also includes:
  • the feedback unit is used to feed back the reference file and/or comparison file containing the difference mark to the client when receiving the comparison result query instruction sent by the client triggered by the RPA robot; when receiving the When the RPA robot triggers the comparison task status query instruction sent by the client, it queries the task status of the comparison task corresponding to the comparison task status query instruction from the task database, and feeds back the queried task status to the client.
  • another embodiment of the present invention also provides a file comparison device based on RPA and AI, the device is applied to the RPA robot, as shown in Figure 5, the device includes:
  • Login and upload unit 40 configured to log into the client, and upload the reference file and the comparison file to the client, the reference file and/or the comparison file are multi-page files;
  • the trigger sending unit 42 is configured to trigger the client to send the reference file and the comparison file to the server for difference comparison.
  • the trigger sending unit 42 is also used to trigger the client to send a comparison result query instruction to the server, so that the server feeds back to the client the reference file containing the difference mark and /or comparing files; and/or, triggering the client to send a comparison task status query instruction to the server, so that the server feeds back the task status of the comparison task to the client.
  • another embodiment of the present invention also provides a computing device, the computing device includes:
  • processors one or more processors
  • the computing device is a server
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement any of the RPA-based and AI file comparison method;
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement any of the above-mentioned RPA-based programs applied to RPA robots Compared with AI's file comparison method.
  • another embodiment of the present invention also provides a server, where the server includes:
  • processors one or more processors
  • the one or more processors are made to implement any of the above-mentioned file comparison methods based on RPA and AI applied to servers.
  • the processor is coupled with the storage device.
  • another embodiment of the present invention also provides a kind of terminal, and described terminal comprises:
  • processors one or more processors
  • the one or more processors are made to implement any of the RPA and AI-based file comparison methods applied to RPA robots as described above.
  • the processor is coupled with the storage device.
  • an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present invention is implemented.
  • the embodiment of the present invention also provides a file comparison system based on RPA and AI, and the system includes an RPA robot 50 , a client 52 and a server 54 .
  • the RPA robot 50 may be independent from the client 52
  • the RPA robot 50 may be a part of the client 52 .
  • the RPA robot 50 is configured to log into the client 52, upload the reference file and the comparison file to the client 52, and trigger the client 52 to send the reference file and the comparison file to
  • the server 54 performs a difference comparison, and the reference file and/or the comparison file are multi-page files;
  • the server 54 is configured to obtain the reference file and the comparison file sent by the client 52; use optical character recognition (OCR) to identify the reference file and the comparison file to obtain the reference file at least one page of text and at least one page of text of the comparison document; at least one page of text of the reference document is spliced into a page of text with continuous context to obtain a reference text, and at least one page of the comparison document is A page of text is spliced into a page of text with continuous context to obtain a comparison text; using a preset comparison algorithm to perform a differential comparison between the reference text and the comparison text to obtain a comparison result; according to the comparison result , performing differential marks on the reference file and/or the comparison file.
  • OCR optical character recognition
  • sequence numbers of the above-mentioned processes do not necessarily mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be used in the implementation of the present invention.
  • the implementation of the examples constitutes no limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean determining B only based on A, and B can also be determined based on A and/or other information.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-accessible memory.
  • the technical solution of the present invention or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute some or all of the steps of the above-mentioned methods in various embodiments of the present invention.
  • ROM read-only Memory
  • RAM random access memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the modules in the device in the embodiment may be distributed in the device in the embodiment according to the description in the embodiment, or may be changed and located in one or more devices different from the embodiment.
  • the modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例公开一种基于RPA和AI的文件比对方法、装置、设备、介质及系统。方法包括:S1、服务器获取客户端发送的参考文件和比对文件,所述参考文件和所述比对文件为RPA机器人触发所述客户端发送的文件;S2、获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;S3、将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;S4、利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;S5、根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。

Description

基于RPA和AI的文件比对方法、装置、设备、介质及系统 技术领域
本发明实施例涉及流程自动化技术领域,具体而言,涉及一种基于RPA和AI的文件比对方法、装置、设备、介质及系统。
背景技术
RPA(Robotic Process Automation,机器人流程自动化),是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。
AI(Artificial Intelligence,人工智能)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学。
RPA具有独特的优势:低代码、非侵入。低代码是说,RPA不需要很高的IT水平就能操作,不懂编程的业务人员也能开发流程;非侵入是说,RPA可以模拟人的操作,不用软件系统开放接口。但是传统的RPA具有一定的局限性:只能基于固定的规则,并且应用场景受限。随着AI技术的不断发展,RPA与AI深度融合克服了传统RPA的局限,RPA+AI=Hand work+Head work,正在极大的改变劳动力的价值。
在日常工作中,常常需要对两个版本的合同、法条等文件进行比对,以确定新产生的文件相对于原始文件发生了哪些变化。然而,目前在进行文件比对时,需要人工获取待比对的两个文件,然后进行人工比对和人工标记差异性。当需要比对的文件较多或者待对比文件页数较多时,就需要工作人员做重复性高、价值低的文件比对劳动,从而占用大量工作时间,工作效率较低。
发明内容
本发明实施例提供一种基于RPA和AI的文件比对方法、装置、设备、介质及系统,能够自动化实现文件比对工作,从而不仅可以省去人力,还可以提高文件比对的效率。
第一方面,本发明实施提供了一种基于RPA和AI的文件比对方法,所述方法应用于服务器,所述方法包括:
S1、获取客户端发送的参考文件和比对文件,所述参考文件和/或所述比对文件为多页文件,所述参考文件和所述比对文件为RPA机器人触发所述客户端发送的文件;
S2、获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;
S3、将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;
S4、利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述 比对文本相对于所述参考文本的差异性比对结果;
S5、根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
可选的,所述S4包括:
S401、按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果;
S402、针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。
可选的,所述S5包括:
S501、若所述差异性比对结果包括内容删除,则对所述参考文件中被所述比对文件删除的文本内容添加差异性标记;
S502、若所述差异性比对结果包括内容增加,则对所述比对文件中相对于所述参考文件增加的文本内容添加差异性标记;
S503、若所述差异性比对结果包括内容修改时,则对所述参考文件中修改前的文本内容和所述比对文件中修改后的文本内容分别添加差异性标记。
可选的,所述S2包括:
利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
可选的,在所述S2之前,所述方法还包括:
S6、对所述客户端的用户信息进行鉴权;
S7、若鉴权通过,则将所述参考文件和所述比对文件的文件格式转换为光学字符识别OCR能够识别的格式;
S8、将针对所述参考文件和所述比对文件的比对任务添加到比对任务队列中,并将所述比对任务和所述比对任务的任务状态记录到任务数据库中;
S9、从所述比对任务队列中获取所述比对任务;
所述方法还包括:
S10、当所述比对任务的任务状态发生变化时,更新所述任务数据库中所述比对任务的任务状态。
可选的,所述方法还包括:
S11、当接收所述RPA机器人触发所述客户端发送的比对结果查询指令时,将包含差异性标记的参考文件和/或比对文件反馈给所述客户端;
S12、当接收所述RPA机器人触发所述客户端发送的比对任务状态查询指令时,从所述任务数据库中查询所述比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。
第二方面,本发明实施例还提供了一种基于RPA和AI的文件比对方法,所述方法应用于RPA机器人,所述方法包括:
S13、登录客户端,并将参考文件和比对文件上传至所述客户端,所述参考文件和/或所述比对文件为多页文件;
S14、触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
可选的,所述方法还包括:
S15、触发所述客户端向所述服务器发送比对结果查询指令,以使得所述服务器向所述客户端反馈包含差异性标记的参考文件和/或比对文件。
和/或,S16、触发所述客户端向所述服务器发送比对任务状态查询指令,以使得所述服务器向所述客户端反馈比对任务的任务状态。
第三方面,本发明实施例提供了一种基于RPA和AI的文件比对装置,所述装置应用于服务器,所述装置包括:
文件获取单元,用于获取客户端发送的参考文件和比对文件,所述参考文件和/或所述比对文件为多页文件,所述参考文件和所述比对文件为RPA机器人触发所述客户端发送的文件;
文本获取单元,用于获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;
拼接单元,用于将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;
比对单元,用于利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;
标记单元,用于根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
可选的,所述比对单元包括:
比对模块,用于按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果;
合并模块,用于针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结 果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。
可选的,所述标记单元包括:
第一标记添加模块,用于若所述差异性比对结果包括内容删除,则对所述参考文件中被所述比对文件删除的文本内容添加差异性标记;
第二标记添加模块,用于若所述差异性比对结果包括内容增加,则对所述比对文件中相对于所述参考文件增加的文本内容添加差异性标记;
第三标记添加模块,用于若所述差异性比对结果包括内容修改时,则对所述参考文件中修改前的文本内容和所述比对文件中修改后的文本内容分别添加差异性标记。
可选的,文本获取单元,用于利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
可选的,所述装置还包括:
鉴权单元,用于在获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本之前,对所述客户端的用户信息进行鉴权;
格式转换单元,用于若鉴权通过,则将所述参考文件和所述比对文件的文件格式转换为光学字符识别OCR能够识别的格式;
任务添加单元,用于将针对所述参考文件和所述比对文件的比对任务添加到比对任务队列中,并将所述比对任务和所述比对任务的任务状态记录到任务数据库中;
任务获取单元,用于从所述比对任务队列中获取所述比对任务;
更新单元,用于当所述比对任务的任务状态发生变化时,更新所述任务数据库中所述比对任务的任务状态。
可选的,所述装置还包括:
反馈单元,用于当接收所述RPA机器人触发所述客户端发送的比对结果查询指令时,将包含差异性标记的参考文件和/或比对文件反馈给所述客户端;当接收所述RPA机器人触发所述客户端发送的比对任务状态查询指令时,从所述任务数据库中查询所述比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。
第四方面,本发明实施例提供了一种一种基于RPA和AI的文件比对装置,所述装置应用于RPA机器人,所述装置包括:
登录上传单元,用于登录客户端,并将参考文件和比对文件上传至所述客户端,所述参考文件和/或所述比对文件为多页文件;
触发发送单元,用于触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
可选的,所述触发发送单元,还用于触发所述客户端向所述服务器发送比对结果查询指令,以使得所述服务器向所述客户端反馈包含差异性标记的参考文件和/或比对文件;和/或,触发所述客户端向所述服务器发送比对任务状态查询指令,以使得所述服务器向所述客户端反馈比对任务的任务状态。
第五方面,本发明实施例还提供了一种计算设备,所述计算设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述计算设备为服务器时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的方法;
当所述计算设备为终端时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第二方面所述的方法。
第六方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面或第二方面所述的方法。
第七方面,本发明实施例还提供了一种基于RPA和AI的文件比对系统,所述系统包括RPA机器人、客户端和服务器;
所述RPA机器人,用于登录所述客户端,并将参考文件和比对文件上传至所述客户端,触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对,所述参考文件和/或所述比对文件为多页文件;
所述服务器,用于获取所述客户端发送的所述参考文件和所述比对文件;获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
本发明实施例提供的基于RPA和AI的文件比对方法、装置、设备、介质及系统,能够通过RPA机器人自动登录客户端并触发客户端向服务器发送参考文件和比对文件,当服务器获取到参考文件和比对文件后,可以先从参考文件和比对文件中分别获取至少一页文本,并将参考文件的至少一页文本拼接为上下文连续的一页文本(可称为参考文本),将比对文件的至少一页文本拼接为上下文连续的一页文本(可称为比对文本),然后利用预设比对算法对参考文本和比对文本进行差异性比对,最后根据差异性比对结果对参考文 件和/或比对文件进行差异性标记。由此可知,与现有技术中需要人工获取文件、比对文件,并进行人工标记差异性相比,本发明实施例不仅能够利用RPA机器人自动触发客户端发送两个待比对的文件给服务器进行自动比对,还可以由服务器自动标记差异性,从而不仅可以节省人力,让原本需要做文件比对的人员有时间去做更有价值的工作,还可以提高文件比对的效率。此外,由于本发明实施例将差异性标记添加到参考文件和/或比对文件当中,而不是在或者仅仅在独立于参考文件和比对文件的第三方区域描述差异性,从而可以提高差异性标记的可读性。在获取参考文件和比对文件的文本内容时,可以通过OCR技术自动识别出两个文件中的文本内容,而无需人工获取,从而可以提高文件比对的效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种基于RPA和AI的文件比对方法的流程图;
图2是本发明实施例提供的另一种基于RPA和AI的文件比对方法的流程图;
图3是本发明实施例提供的一种文件差异性比对结果的示例图;
图4是本发明实施例提供的一种基于RPA和AI的文件比对装置的组成框图;
图5是本发明实施例提供的另一种基于RPA和AI的文件比对装置的组成框图;
图6是本发明实施例提供的一种基于RPA和AI的文件比对系统架构图;
图7是本发明实施例提供的另一种基于RPA和AI的文件比对系统架构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,本发明实施例及附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在日常工作中,常常需要人工对不同版本的文件进行差异性比对,该工作不仅重复性强、难度低,还十分消耗时间,进而导致公司对自动化比对文件的需求越来越急迫。而RPA(Robotic Process Automation,机器人流程自动化)技术可以通过用户使用界面,智 能理解所在电子设备的已有应用,将重复的、基于规则、大批量的常规操作自动化,如自动重复读取邮件、读取Office组件、操作数据库及网页、客户端软件等,采集数据并进行繁琐的计算,并批量生成所需的文件和报告,从而通过RPA技术能够大幅降低人力成本的投入,有效提高办公效率。AI(Artificial Intelligence,人工智能)技术可以突破固定规则,模拟人的思维、意识来自动化处理一些更复杂的应用场景。基于此,本发明实施例提供了一种结合RPA和AI两种技术来自动化比对文件,从而不仅可以省去人力,还可以提高文件比对的效率。
下面对本发明实施例进行详细说明。
在本发明实施例的描述中,术语“参考文件”是指在进行差异性比对时,被作为参考依据的文件,“比对文件”是指被比对的两个文件中除了作为参考依据以外的文件,在实际应用中,参考文件的版本往往低于比对文件,参考文件和比对文件可以为任何领域的文件,比如可以为合同文件、金融类文件、程序文件等。
在本发明实施例的描述中,术语“多页文件”是指大于或者等于两页文本内容的文件。
在本发明实施例的描述中,术语“OCR”是指光学字符识别(Optical Character Recognition),具体是指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。在本发明实施例中,可以基于RPA机器人利用OCR技术将纸质文档中的文字转换成为黑白点阵的图像文件,再由服务器利用OCR技术从图像文件中识别出图像文件中包含的文本内容,也可以基于RPA机器人利用OCR技术从纸质文档中获取文本内容,生成包含文本内容的文本文件(即一种可编辑文件),再由服务器直接从文本文件中提取文本内容。
在本发明实施例的描述中,术语“客户端”是具有文件比对需求的业务系统前端,“服务器”是指具有文件比对需求的业务系统后端。“客户端”可以为业务系统对应的应用软件,也可以为浏览器,以便RPA机器人通过浏览器访问业务系统的网站。术语“RPA机器人”可以集成在客户端中,也可以以插件等形式嵌入客户端中,也可以与客户端相互独立,只要RPA机器人能够自动化访问客户端即可,本发明实施例对RPA机器人的具体形式不做限定。
在本发明实施例的描述中,术语“NLP”是指自然语言处理(Natural Language Processing),其以语言为对象,利用计算机技术来分析、理解和处理自然语言的一门学科,即把计算机作为语言研究的强大工具,在计算机的支持下对语言信息进行定量化的研 究,并提供可供人与计算机之间能共同使用的语言描写。
在本发明实施例的描述中,术语“拼接”是指将待拼接的文本连接在一起,而不改变文本内容。即可以在保留原有文本内容排列顺序的基础上,让多页文本内容无缝衔接。
在本发明实施例的描述中,术语“预设比对算法”是指确定比对文本相对于参考文本所存在差异的具体比对方法,可以按照预设比对单元对参考文本和比对文本分批次进行比对,直至比对完成,具体比对过程可参见S130的详解。其中,术语“预设比对单元”是指每次被比对的文本的大小,具体可以根据实际情况而定,可以为一个词组、一个句子或一个段落等。
在本发明实施例的描述中,术语“差异性比对”是指对比参考文本和比对文本之间存在哪些差异。术语“差异性比对结果”是指在将参考文本和比对文本进行差异性比对后,获得包含多个比对子结果,每个比对子结果包括差异类型和差异类型对应差异内容,差异类型包括内容相同、内容增加、内容删除和内容修改。术语“差异性标记”是一种能够突显出参考文件和比对文件具体存在哪些差异的标记,差异性标记包括不限于以下一种或多种的组合:加粗字体、更换字体颜色、增加字体底色、加亮字体、增大字体、更换为斜体、增加下划线等。
在本发明实施的描述中,术语“鉴权”是指验证发送参考文件和比对文件的客户端是否具有进行文件比对的权限,具体可以通过验证客户端的用户信息是否满足该权限要求来实现鉴权。
图1是本发明实施例提供的一种基于RPA和AI的文件比对方法,该方法主要应用于服务器,具体包括:
S100、获取客户端发送的参考文件和比对文件。
其中,所述参考文件和/或所述比对文件为多页文件,即参考文件和比对文件中至少有一个是多页文件。所述参考文件和所述比对文件为RPA机器人触发所述客户端发送的文件,即由PRA机器人先登录客户端,再在对应页面触发客户端发送参考文件和比对文件。RPA机器人触发所述客户端向服务器发送参考文件和比对文件的具体实现方式可以参见下述以RPA机器人为执行主体的方法实施例,在此不再赘述。
可选的,服务器获取到客户端发送的参考文件和比对文件后,可以先对客户端的用户信息进行鉴权,以验证用户是否具有文件比对权限;若鉴权通过,则将所述参考文件和所述比对文件的文件格式转换为OCR能够识别的格式,将针对所述参考文件和所述比对文件的比对任务添加到比对任务队列中,并将所述比对任务和所述比对任务的任务状态记录到任务数据库中,以便后续从所述比对任务队列中获取所述比对任务,并执行所述比对任务,实现对所述比对任务中待比对的参考文件和比对文件进行差异性比对;若鉴权未通 过,则服务器不会对参考文件和比对文件进行差异性比对,并且可以向客户端反馈没有比对权限的提醒信息。
其中,用户信息可以为客户端账号,可以为与该客户端账号绑定的手机号,还可以为用户等级或者其他信息,本发明实施对用户信息的具体内容不做限定,可以根据具体情况而定。对用户信息鉴权的方法可以有多种,包含但不限于以下两种:(1)将该用户信息与具有权限的用户列表进行匹配,若匹配成功,则确定该用户信息对应的用户有权限,即鉴权通过,若匹配失败,则确定该用户信息对应的用户没有权限,即鉴权失败;(2)判断该用户信息中的用户等级是否超过预设等级,若超过预设等级,则鉴权通过,若未超过预设等级,则鉴权失败。
OCR能够识别的格式包括但不限于图像格式、pdf格式等,当参考文件和/或比对文件不是OCR能够识别的格式时,可以将参考文件和/或比对文件转换为OCR能够识别的格式,以便后续可以利用OCR识别出参考文件和/或比对文件中的文本内容,例如当参考文件和/或比对文件为docx格式时,可以将docx格式转换成pdf格式。
当所述比对任务的任务状态发生变化时,服务器可以更新所述任务数据库中所述比对任务的任务状态。当比对任务未被执行时,任务状态可以是未处理,当比对任务正在被执行时,任务状态可以是处理中,当比对任务执行完成时(即已进行差异性标记),任务状态可以为已完成。
S110、获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
在S100中提及,可以将参考文件和比对文件的文件格式转换为OCR能够识别的格式,所以本步骤可以利用OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
S120、将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本。
在实际应用中,若直接将参考文件的至少一页文本与比对文件的至少一页文本进行单页比对,即将参考文件的第N页与比对文件的第N页进行比对,而不关注各页之间的关联关系,则很容易发生比对结果不准确的情况。例如,参考文件包括两页文本,比对文件在参考文件的第一页文本和第二页文本之间添加了一页文本,从而构成三页文本,若采用单页比对的方法对这两个文件进行比对,则比对结果为参考文件第二页文本与比对文件的第二页文本内容不同,参考文件没有第三页文本,使得比对结果为比对文件的第三页文本在参考文件中不存在,也就是说,采用单页比对的方法,会导致整体比对结果为两个文件除了第一页文本相同外,其他均不相同。
为了避免发生比对结果不准确的问题,本发明实施例在对参考文件和比对文件进行差 异性比对之前,先将参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本,然后再将参考文本与比对文本进行差异性比对,即将参考文件中的文本内容和比对文件中的文本内容分别作为一个整体进行比对,从而可以避免因没有考虑各页关联关系而导致比对结果不准确的问题。
其中,上下文连续是指保持原有文字的先后顺序。将参考文件或者比对文件的至少一页文本拼接为上下文连续的一页文本的具体方法可以为按照参考文件或者比对文件的页面顺序,将至少一页文本依次进行拼接,从而获得上下文连续的一页文本。
具体的,将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,包括:当所述参考文件为包含多页文本的文件时,将所述参考文件的多页文本拼接为上下文连续的一页文本,获得参考文本;当所述参考文件为包含单页文本的文件时,从所述参考文件中获取单页文本作为参考文本。将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本,包括:当所述比对文件为包含多页文本的文件时,将所述比对文件的多页文本拼接为上下文连续的一页文本,获得比对文本;当所述比对文件为包含单页文本的文件时,从所述比对文件中获取单页文本作为比对文本。
S130、利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果。
具体的,可以按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果。在按照预设比对单元对参考文本和比对文本进行比对的过程中,若确定正在比对的参考子文本(预设比对单元的参考文本)与比对子文本(预设比对单元的比对文本)内容相同,则将对应的比对子结果确定为内容相同;若确定正在比对的参考子文本在比对文本中不存在,则将对应的比对子结果确定为内容删除;若确定正在比对的比对子文本在参考文本中不存在,则将对应的比对子结果确定为内容增加。在实际应用中,两个文本之间的差异除了包括内容相同、内容删除和内容增加外,还应该包括内容修改。因此,为了让用户能够更直观地看出比对文本相对于参考文本的区别,可以针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。其中,预设比对单元的大小可以根据实际情况而定,可以为一个词组、一个句子、一个段落等。
需要补充的是,在对两个文本进行比对时,除了简单地判断文本内容本身使用的字符或文字是否相同外,还可以结合NLP技术对参考子文本与比对子文本进行语义分析,当 所述参考子文本与所述比对子文本的含义相同但使用的字符或文字不同时,可以确定对应的比对子结果为内容相同。另外,本发明实施例还可以支持自定义过滤规则,忽略无意义的差异,即当参考子文本和比对子文本之间的差异中存在满足预设过滤规则的差异时,忽略所述满足预设过滤规则的差异。例如,可以设定一个句子有无助词“的”不影响比对结果。
S140、根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
具体的,若所述差异性比对结果包括内容删除,则对所述参考文件中被所述比对文件删除的文本内容添加差异性标记;若所述差异性比对结果包括内容增加,则对所述比对文件中相对于所述参考文件增加的文本内容添加差异性标记;若所述差异性比对结果包括内容修改时,则对所述参考文件中修改前的文本内容和所述比对文件中修改后的文本内容分别添加差异性标记。
差异性标记包括不限于以下一种或多种的组合:加粗字体、更换字体颜色、增加字体底色、加亮字体、增大字体、更换为斜体、增加下划线等。当差异性比对结果包括多种类型的比对子结果时,不同比对子结果对应的差异性标记的表现形式可以相同,也可以不同。
可选的,在将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本时,可以根据参考文本的文本内容在参考文件中的位置信息,生成参考文件的位置信息配置文件,在将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本时,可以根据比对文本的文本内容在比对文件中的位置信息,生成比对文件的位置信息配置文件,以便后续在根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记时,可以根据所述差异性比对结果、位置信息配置文件,对所述参考文件和/或所述比对文件进行差异性标记。其中,位置信息包括文本内容在对应文件中的页码、在所述页码对应页面的所在行和所在列;位置信息配置文件包括文本内容和所述文本内容的位置信息。
具体的,若所述差异性比对结果包括内容删除,则根据参考文件的位置信息配置文件,确定所述参考文件中被所述比对文件删除的文本内容在所述参考文件中的位置信息,并在确定的位置信息处添加差异性标记;若所述差异性比对结果包括内容增加,则根据比对文件的位置信息配置文件,确定所述比对文件中相对于所述参考文件增加的文本内容在所述比对文件中的位置信息,并在确定的位置信息处添加差异性标记;若所述差异性比对结果包括内容修改时,则根据参考文件的位置信息配置文件,确定所述参考文件中修改前的文本内容在所述参考文件中的位置信息,根据比对文件的位置信息配置文件,确定所述 比对文件中修改后的文本内容在所述比对文件中的位置信息,并分别在针对参考文件和比对文件确定的位置信息处添加差异性标记。
可选的,当服务器接收所述RPA机器人触发所述客户端发送的比对结果查询指令时,将包含差异性标记的参考文件和/或比对文件反馈给所述客户端,还可以将差异性比对结果反馈给客户端,从而不仅可以让用户直观地在参考文件和/或比对文件中看到差异,还可以看到单独汇总的差异。当服务器接收所述RPA机器人触发所述客户端发送的比对任务状态查询指令时,从所述任务数据库中查询所述比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。
本发明实施例提供的基于RPA和AI的文件比对方法,能够通过RPA机器人自动登录客户端并触发客户端向服务器发送参考文件和比对文件,当服务器获取到参考文件和比对文件后,可以先从参考文件和比对文件中分别获取至少一页文本,并将参考文件的至少一页文本拼接为上下文连续的一页文本(可称为参考文本),将比对文件的至少一页文本拼接为上下文连续的一页文本(可称为比对文本),然后利用预设比对算法对参考文本和比对文本进行差异性比对,最后根据差异性比对结果对参考文件和/或比对文件进行差异性标记。由此可知,与现有技术中需要人工获取文件、比对文件,并进行人工标记差异性相比,本发明实施例不仅能够利用RPA机器人自动触发客户端发送两个待比对的文件给服务器进行自动比对,还可以由服务器自动标记差异性,从而不仅可以节省人力,让原本需要做文件比对的人员有时间去做更有价值的工作,还可以提高文件比对的效率。此外,由于本发明实施例将差异性标记添加到参考文件和/或比对文件当中,而不是在或者仅仅在独立于参考文件和比对文件的第三方区域描述差异性,从而可以提高差异性标记的可读性。在获取参考文件和比对文件的文本内容时,可以通过OCR技术自动识别出两个文件中的文本内容,而无需人工获取,从而可以提高文件比对的效率。
基于上述方法实施例,本发明的另一个实施例还提供了一种基于RPA和AI的文件比对方法,所述方法应用于RPA机器人,如图2所示,所述方法包括:
S200、登录客户端,并将参考文件和比对文件上传至所述客户端。
所述参考文件和/或所述比对文件为多页文件。具体的,可以在能够登录客户端的电子设备中配置RPA程序(可以集成或嵌入客户端,也可以独立于客户端),以使电子设备可以按照RPA程序中设定的规则,模拟用户的鼠标键盘操作自动登录客户端,并通过访问客户端触发客户端生成包括参考文件和比对文件的文件比对请求,向服务器发送该文件比对请求,以便服务器对参考文件和比对文件进行差异性比对。其中,在登录客户端时,客户端可以弹出包含验证码图像的登录界面,这种情况下RPA机器人可以对验证码图像进行OCR识别,获得验证码图像中的验证码内容,并将验证码内容输入到对应的编辑框, 从而成功登录客户端。
S210、触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
参考文件和比对文件可以存储在客户端中,也可以存储在电子设备的其他存储空间,也可以是纸质文件。当存储在电子设备的其他存储空间时,RPA机器人可以从所述其他存储空间查找参考文件和比对文件,并将参考文件和比对文件上传到客户端,例如通过点击上传按钮的方式将这两个文件上传到客户端,也可以通过拖拽的方式将这两个文件拖到指定区域以实现文件上传,也可以为其他上传方式。当参考文件和/或比对文件为纸质文件时,RPA机器人可以利用OCR技术先将纸质文件转换为图像文件或者转换为文本文件(即纸质文件中的文本内容构成的可编辑文件),然后再利用上述方法上传到客户端。
此外,RPA机器人还可以触发所述客户端向所述服务器发送比对结果查询指令,以使得所述服务器向所述客户端反馈包含差异性标记的参考文件和/或比对文件,并输出显示包含差异性标记的参考文件和/或比对文件;和/或,还可以触发所述客户端向所述服务器发送比对任务状态查询指令,以使得所述服务器从任务数据库中查询比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。RPA机器人触发客户端发送比对结果查询指令或者比对任务状态查询指令的具体实现方式包括但不限于RPA机器人通过点击客户端上的比对结果查询按钮或比对任务状态查询按钮的方式触发客户端生成并发送对应的指令。
示例性的,图3是参考文件和比对文件的部分文本内容,可以直接在参考文件和比对文件中显示差异性(加粗加斜文本是指在比对文件中增加的文本,加下划线文本是指在比对文件中删除的文本,加粗加大的文本是指发生修改的文本),并且还展示了差异性汇总内容(见图中右侧部分)。
基于服务器侧的方法实施例,本发明的另一个实施例还提供了一种基于RPA和AI的文件比对装置,所述装置应用于服务器,如图4所示,所述装置包括:
文件获取单元30,用于获取客户端发送的参考文件和比对文件,所述参考文件和/或所述比对文件为多页文件,所述参考文件和所述比对文件为RPA机器人触发所述客户端发送的文件;
文本获取单元32,用于获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;
拼接单元34,用于将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;
比对单元36,用于利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;
标记单元38,用于根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
可选的,所述比对单元36包括:
比对模块,用于按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果;
合并模块,用于针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。
可选的,所述标记单元38包括:
第一标记添加模块,用于若所述差异性比对结果包括内容删除,则对所述参考文件中被所述比对文件删除的文本内容添加差异性标记;
第二标记添加模块,用于若所述差异性比对结果包括内容增加,则对所述比对文件中相对于所述参考文件增加的文本内容添加差异性标记;
第三标记添加模块,用于若所述差异性比对结果包括内容修改时,则对所述参考文件中修改前的文本内容和所述比对文件中修改后的文本内容分别添加差异性标记。
可选的,文本获取单元32,用于利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
可选的,所述装置还包括:
鉴权单元,用于在获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本之前,对所述客户端的用户信息进行鉴权;
格式转换单元,用于若鉴权通过,则将所述参考文件和所述比对文件的文件格式转换为光学字符识别OCR能够识别的格式;
任务添加单元,用于将针对所述参考文件和所述比对文件的比对任务添加到比对任务队列中,并将所述比对任务和所述比对任务的任务状态记录到任务数据库中;
任务获取单元,用于从所述比对任务队列中获取所述比对任务;
更新单元,用于当所述比对任务的任务状态发生变化时,更新所述任务数据库中所述比对任务的任务状态。
可选的,所述装置还包括:
反馈单元,用于当接收所述RPA机器人触发所述客户端发送的比对结果查询指令时,将包含差异性标记的参考文件和/或比对文件反馈给所述客户端;当接收所述RPA机器人触发所述客户端发送的比对任务状态查询指令时,从所述任务数据库中查询所述比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。
基于RPA机器人侧的方法实施例,本发明的另一个实施例还提供了基于RPA和AI的文件比对装置,所述装置应用于RPA机器人,如图5所示,所述装置包括:
登录上传单元40,用于登录客户端,并将参考文件和比对文件上传至所述客户端,所述参考文件和/或所述比对文件为多页文件;
触发发送单元42,用于触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
可选的,所述触发发送单42元,还用于触发所述客户端向所述服务器发送比对结果查询指令,以使得所述服务器向所述客户端反馈包含差异性标记的参考文件和/或比对文件;和/或,触发所述客户端向所述服务器发送比对任务状态查询指令,以使得所述服务器向所述客户端反馈比对任务的任务状态。
基于上述方法实施例,本发明的另一个实施例还提供了一种计算设备,所述计算设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述计算设备为服务器时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的任一应用于服务器的基于RPA和AI的文件比对方法;
当所述计算设备为终端时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的任一应用于RPA机器人的基于RPA和AI的文件比对方法。
基于服务器侧的实施例,本发明的另一个实施例还提供了一种服务器,所述服务器包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的任一应用于服务器的基于RPA和AI的文件比对方法。其中,处理器与存储装置相耦合。
基于RPA机器人侧的实施例,本发明的另一个实施例还提供了一种终端,所述终端 包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的任一应用于RPA机器人的基于RPA和AI的文件比对方法。其中,处理器与存储装置相耦合。
基于上述方法实施例,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本发明任一实施例所述的方法。
基于上述实施例,本发明实施例还提供了一种基于RPA和AI的文件比对系统,所述系统包括RPA机器人50、客户端52和服务器54。如图6所示,RPA机器人50可以与客户端52相互独立,如图7所示,RPA机器人50可以是客户端52的一部分。
所述RPA机器人50,用于登录所述客户端52,并将参考文件和比对文件上传至所述客户端52,触发所述客户端52将所述参考文件和所述比对文件发送给服务器54进行差异性比对,所述参考文件和/或所述比对文件为多页文件;
所述服务器54,用于获取所述客户端52发送的所述参考文件和所述比对文件;利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得比对结果;根据所述比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本发明的各种实施例中,应理解,上述各过程的序号的大小并不意味着执行顺序的必然先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
在本发明所提供的实施例中,应理解,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。
另外,在本发明各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元若以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可获取的存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或者部分,可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干请求用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本发明的各个实施例上述方法的部分或全部步骤。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory,ROM)、随机存储器(Random Access Memory,RAM)、可编程只读存储器(Programmable Read-only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、一次可编程只读存储器(One-time Programmable Read-Only Memory,OTPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。
本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。
本领域普通技术人员可以理解:实施例中的装置中的模块可以按照实施例描述分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。

Claims (13)

  1. 一种基于RPA和AI的文件比对方法,所述方法应用于服务器,其特征在于,所述方法包括:
    S1、获取客户端发送的参考文件和比对文件,所述参考文件和/或所述比对文件为多页文件,所述参考文件和所述比对文件为机器人流程自动化RPA机器人触发所述客户端发送的文件;
    S2、获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;
    S3、将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;
    S4、利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;
    S5、根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
  2. 根据权利要求1所述的方法,其特征在于,所述S4包括:
    S401、按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果;
    S402、针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。
  3. 根据权利要求1所述的方法,其特征在于,所述S5包括:
    S501、若所述差异性比对结果包括内容删除,则对所述参考文件中被所述比对文件删除的文本内容添加差异性标记;
    S502、若所述差异性比对结果包括内容增加,则对所述比对文件中相对于所述参考文件增加的文本内容添加差异性标记;
    S503、若所述差异性比对结果包括内容修改时,则对所述参考文件中修改前的文本内容和所述比对文件中修改后的文本内容分别添加差异性标记。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述S2包括:
    利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本。
  5. 根据权利要求4所述的方法,其特征在于,在所述S2之前,所述方法还包括:
    S6、对所述客户端的用户信息进行鉴权;
    S7、若鉴权通过,则将所述参考文件和所述比对文件的文件格式转换为光学字符识别OCR能够识别的格式;
    S8、将针对所述参考文件和所述比对文件的比对任务添加到比对任务队列中,并将所述比对任务和所述比对任务的任务状态记录到任务数据库中;
    S9、从所述比对任务队列中获取所述比对任务;
    所述方法还包括:
    S10、当所述比对任务的任务状态发生变化时,更新所述任务数据库中所述比对任务的任务状态。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    S11、当接收所述RPA机器人触发所述客户端发送的比对结果查询指令时,将包含差异性标记的参考文件和/或比对文件反馈给所述客户端;
    S12、当接收所述RPA机器人触发所述客户端发送的比对任务状态查询指令时,从所述任务数据库中查询所述比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。
  7. 一种基于RPA和AI的文件比对方法,所述方法应用于机器人流程自动化RPA机器人,其特征在于,所述方法包括:
    S13、登录客户端,并将参考文件和比对文件上传至所述客户端,所述参考文件和/或所述比对文件为多页文件;
    S14、触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    S15、触发所述客户端向所述服务器发送比对结果查询指令,以使得所述服务器向所述客户端反馈包含差异性标记的参考文件和/或比对文件;
    和/或,S16、触发所述客户端向所述服务器发送比对任务状态查询指令,以使得所述服务器向所述客户端反馈比对任务的任务状态。
  9. 一种基于RPA和AI的文件比对装置,所述装置应用于服务器,其特征在于,所述装置包括:
    文件获取单元,用于获取客户端发送的参考文件和比对文件,所述参考文件和/或所述比对文件为多页文件,所述参考文件和所述比对文件为机器人流程自动化RPA机器人触发所述客户端发送的文件;
    文本获取单元,用于获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;
    拼接单元,用于将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;
    比对单元,用于利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;
    标记单元,用于根据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
  10. 一种基于RPA和AI的文件比对装置,所述装置应用于机器人流程自动化RPA机器人,其特征在于,所述装置包括:
    登录上传单元,用于登录客户端,并将参考文件和比对文件上传至所述客户端,所述参考文件和/或所述比对文件为多页文件;
    触发发送单元,用于触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对。
  11. 一种计算设备,其特征在于,所述计算设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述计算设备为服务器时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法;
    当所述计算设备为终端时,所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求7-8中任一所述的方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8中任一所述的方法。
  13. 一种基于RPA和AI的文件比对系统,其特征在于,所述系统包括机器人流程自动化RPA机器人、客户端和服务器;
    所述RPA机器人,用于登录所述客户端,并将参考文件和比对文件上传至所述客户端,触发所述客户端将所述参考文件和所述比对文件发送给服务器进行差异性比对,所述参考文件和/或所述比对文件为多页文件;
    所述服务器,用于获取所述客户端发送的所述参考文件和所述比对文件;获取所述参考文件的至少一页文本以及所述比对文件的至少一页文本;将所述参考文件的至少一页文本拼接为上下文连续的一页文本,获得参考文本,以及将所述比对文件的至少一页文本拼接为上下文连续的一页文本,获得比对文本;利用预设比对算法对所述参考文本和所述比对文本进行差异性比对,获得所述比对文本相对于所述参考文本的差异性比对结果;根 据所述差异性比对结果,对所述参考文件和/或所述比对文件进行差异性标记。
PCT/CN2021/131818 2021-09-27 2021-11-19 基于rpa和ai的文件比对方法、装置、设备、介质及系统 WO2023045056A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111136129.1A CN113836096A (zh) 2021-09-27 2021-09-27 基于rpa和ai的文件比对方法、装置、设备、介质及系统
CN202111136129.1 2021-09-27

Publications (1)

Publication Number Publication Date
WO2023045056A1 true WO2023045056A1 (zh) 2023-03-30

Family

ID=78970895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131818 WO2023045056A1 (zh) 2021-09-27 2021-11-19 基于rpa和ai的文件比对方法、装置、设备、介质及系统

Country Status (2)

Country Link
CN (1) CN113836096A (zh)
WO (1) WO2023045056A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023045053A1 (zh) * 2021-09-27 2023-03-30 北京来也网络科技有限公司 基于rpa和ai的文件比对方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140330834A1 (en) * 2013-05-03 2014-11-06 International Business Machines Corporation Comparing markup language files
CN108805098A (zh) * 2018-06-21 2018-11-13 云城(北京)数据科技有限公司 纸面文档与电子文档的比对方法、装置和系统
CN109543614A (zh) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 一种全文本差异比对方法及设备
CN111752900A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文件存储方法、装置、设备和介质
CN111753517A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文档对比方法、装置、设备及介质
CN113434798A (zh) * 2021-06-21 2021-09-24 湖南大学 一种无代码rpa自动化流程文件的生成方法和系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL235565B (en) * 2014-11-06 2019-06-30 Kolton Achiav Position-based optical character recognition
CN110162509A (zh) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 文件比对方法、装置、计算机设备及存储介质
CN111914597B (zh) * 2019-05-09 2024-03-15 杭州睿琪软件有限公司 一种文档对照识别方法、装置、电子设备和可读存储介质
CN111460763A (zh) * 2020-03-02 2020-07-28 南京南瑞继保电气有限公司 文件差异的标注方法、装置、设备及计算机可读存储介质
CN112084748A (zh) * 2020-09-19 2020-12-15 神思电子技术股份有限公司 一种文本比对方法
CN112882947B (zh) * 2021-03-15 2024-06-11 深圳市腾讯信息技术有限公司 一种界面测试方法、装置、设备及存储介质
CN113407665A (zh) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 文本比对方法、装置、介质及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140330834A1 (en) * 2013-05-03 2014-11-06 International Business Machines Corporation Comparing markup language files
CN108805098A (zh) * 2018-06-21 2018-11-13 云城(北京)数据科技有限公司 纸面文档与电子文档的比对方法、装置和系统
CN109543614A (zh) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 一种全文本差异比对方法及设备
CN111752900A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文件存储方法、装置、设备和介质
CN111753517A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文档对比方法、装置、设备及介质
CN113434798A (zh) * 2021-06-21 2021-09-24 湖南大学 一种无代码rpa自动化流程文件的生成方法和系统

Also Published As

Publication number Publication date
CN113836096A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
US20180113862A1 (en) Method and System for Electronic Document Version Tracking and Comparison
WO2022062798A1 (zh) 基于rpa及ai的表格信息抽取方法、装置、设备及介质
US9740995B2 (en) Coordinate-based document processing and data entry system and method
US20090241165A1 (en) Compliance policy management systems and methods
US10394926B2 (en) Function and memory mapping registry with reactive management events
JP2022031625A (ja) 情報をプッシュするための方法および装置、電子機器、記憶媒体並びにコンピュータプログラム
US9680659B2 (en) Obtaining, managing and archiving conference data
JP7448205B2 (ja) 発明文書分析システム、発明文書分析方法及び発明文書分析処理プログラム
WO2023045056A1 (zh) 基于rpa和ai的文件比对方法、装置、设备、介质及系统
CN115794225A (zh) 一种基于自然语言处理业务流程的方法
KR20240012245A (ko) 자연어처리 기반의 인공지능 모델을 이용한 faq를 자동생성하기 위한 방법 및 이를 위한 장치
US9244707B2 (en) Transforming user interface actions to script commands
US20170154029A1 (en) System, method, and apparatus to normalize grammar of textual data
US20220237398A1 (en) Document identification and splitting in an online document system
WO2024045955A1 (zh) 结合rpa和ai实现ia的文书处理方法、装置、设备及介质
US7945601B2 (en) Reporting of approval workflow transactions using XMLP
US7788283B2 (en) On demand data proxy
KR101005651B1 (ko) 트리 구조와 대화하는 방법 및 시스템
US20220405503A1 (en) Machine learning-based document splitting and labeling in an electronic document system
WO2023045053A1 (zh) 基于rpa和ai的文件比对方法、装置、设备及存储介质
WO2019028249A1 (en) AUTOMATED REPORT SYSTEM
CN113055274B (zh) 一种基于rpa的文件分发方法、装置及电子设备
JP5337089B2 (ja) 業務文書処理装置、及びプログラム
US20140222762A1 (en) Automated document archive for a document processing unit
WO2023159778A1 (zh) 结合rpa及ai的招标文档获取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21958169

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE