CN113836092A

CN113836092A - File comparison method, device, equipment and storage medium based on RPA and AI

Info

Publication number: CN113836092A
Application number: CN202111138084.1A
Authority: CN
Inventors: 赵鹏; 汪冠春; 胡一川; 褚瑞; 李玮
Original assignee: Beijing Laiye Network Technology Co Ltd; Laiye Technology Beijing Co Ltd
Current assignee: Beijing Laiye Network Technology Co Ltd; Laiye Technology Beijing Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-24
Anticipated expiration: 2041-09-27
Also published as: WO2023045053A1

Abstract

The embodiment of the invention discloses a file comparison method, a file comparison device, file comparison equipment and a storage medium based on RPA and AI. The method comprises the following steps: s1, receiving a reference file and a comparison file uploaded by the robot process automation RPA robot; s2, sending the reference file and the comparison file to a server; s3, receiving a difference comparison result of the comparison file sent by the server relative to the reference file; and S4, highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file. By the scheme, not only can the automation of file comparison be realized, but also the difference between the two files can be highlighted.

Description

File comparison method, device, equipment and storage medium based on RPA and AI

Technical Field

The embodiment of the invention relates to the technical field of process automation, in particular to a file comparison method, a file comparison device, file comparison equipment and a storage medium based on RPA and AI.

Background

RPA (robot Process Automation) simulates human operations on a computer through specific "robot software" and automatically executes Process tasks according to rules.

AI (Artificial Intelligence) is a new technical science for studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence.

RPA has unique advantages: low code, non-intrusive. The low code means that the RPA can be operated without high IT level, and business personnel who do not know programming can also develop the flow; non-invasively, the RPA can simulate human operation without opening the interface with a software system. However, conventional RPA has certain limitations: can only be based on fixed rules and application scenarios are limited. With the continuous development of the AI technology, the limitation of the traditional RPA is overcome by the deep fusion of the RPA and the AI, and the RPA + AI is a Hand work + Head work, which greatly changes the value of the labor force.

In daily work, it is often necessary to compare two versions of a document, such as a contract, a statute, etc., to determine what changes have occurred in a newly generated document relative to the original document. However, currently, when performing document comparison, two documents to be compared need to be manually acquired, and then manual comparison and manual labeling of differences are performed. When the number of files to be compared is large or the number of pages of the files to be compared is large, the workers are required to do repeated low-value file comparison work, so that a large amount of working time is occupied, and the working efficiency is low.

Disclosure of Invention

The embodiment of the invention provides a file comparison method, a device, equipment and a storage medium based on RPA and AI, which not only can realize the automation of file comparison, but also can highlight the difference between two files, thereby improving the efficiency of searching the file difference by a user.

In a first aspect, the present invention provides a file comparison method based on RPA and AI, where the method is applied to a client, and the method includes:

s1, receiving a reference file and a comparison file uploaded by the robot process automation RPA robot;

s2, sending the reference file and the comparison file to a server;

s3, receiving a difference comparison result of the comparison file sent by the server relative to the reference file;

and S4, highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file.

Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference location information of the difference text in the reference file, and difference location information of the difference text in the comparison file, and the difference location information includes a page identifier of a page to which the difference text belongs, and coordinate information of the page to which the difference text belongs.

Optionally, the S4 includes:

s41, converting the coordinate information into position information of the divided DIV elements;

and S42, when the DIV element position information enters the display area of the document, highlighting the differential text at the DIV element position information in the page indicated by the page identifier according to the differential type corresponding to the DIV element position information and the page identifier corresponding to the DIV element position information.

Optionally, the S4 further includes:

s43, aiming at the same piece of difference information, generating an ID card identification number ID according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type, and respectively binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;

and S44, when a first synchronous positioning instruction triggered based on the reference file or the comparison file is received, synchronously highlighting the differential texts at the position information of all DIV elements bound with the ID corresponding to the first synchronous positioning instruction.

Optionally, after S3, the method further includes:

and S5, displaying a difference detail in a preset display area according to the difference comparison result, wherein the preset display area is an area except for a reference file display area and a comparison file display area, and the difference detail comprises a difference type in each piece of difference information, a difference text in the reference file and a difference text in the comparison file.

Optionally, after S43, the method further includes:

s45, binding the ID and the corresponding difference information in the difference detail;

s46, when a second synchronous positioning instruction triggered based on the difference detail is received, acquiring an ID bound with the difference information in the difference detail corresponding to the second synchronous positioning instruction;

s47, highlighting the difference text at all the DIV element position information bound with the acquired ID in synchronization.

Optionally, before the S4, the method further includes:

s6, receiving a scroll instruction aiming at a first scroll bar, wherein the first scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area;

s7, determining the proportion of the currently rolled length of the first scroll bar in the total length of the rolling area according to the rolling instruction;

and S8, scrolling a second scroll bar according to the proportion so that the first scroll bar and the second scroll bar scroll synchronously, wherein the second scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area, but is different from the first scroll bar.

Optionally, the S4 includes:

and highlighting the difference text currently scrolled to a display area in the comparison file and/or the reference file according to the difference comparison result.

Optionally, the S2 includes:

s21, recognizing the reference file and the comparison file by using an Optical Character Recognition (OCR) to obtain at least one page of text of the reference file and at least one page of text of the comparison file;

s22, when a target file is a file containing a multi-page text, splicing the multi-page text of the target file into a one-page text with continuous context to obtain a target text, and when the target file is a file containing a single-page text, obtaining the single-page text from the target file as the target text, wherein when the target file is the reference file, the target text is a reference text, and when the target file is the comparison file, the target text is a comparison text;

and S23, sending the reference text and the comparison text to the server.

In a second aspect, an embodiment of the present invention provides a file comparison apparatus based on RPA and AI, where the apparatus is applied to a client, and the apparatus includes:

the receiving unit is used for receiving a reference file and a comparison file uploaded by the robot process automation RPA robot;

the sending unit is used for sending the reference file and the comparison file to a server;

the receiving unit is further configured to receive a difference comparison result of the comparison file sent by the server with respect to the reference file;

and the display unit is used for highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file.

Optionally, the display unit includes:

a conversion module for converting the coordinate information into divided DIV element position information;

and the display module is used for highlighting the differential text at the position of the DIV element position information in the paging indicated by the paging identifier according to the differential type corresponding to the DIV element position information and the paging identifier corresponding to the DIV element position information when the DIV element position information enters the display area of the document to which the DIV element position information belongs.

Optionally, the display unit further includes:

the generating module is used for generating an ID card identification number ID according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type aiming at the same difference information;

a binding module, configured to bind the ID with the position information of the DIV element of the differential text in the reference document and the position information of the DIV element of the differential text in the comparison document, respectively;

and the first synchronization module is used for synchronously highlighting the differential texts at the position information of all DIV elements bound with the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file.

Optionally, the display unit is further configured to display a difference detail in a preset display area according to the difference comparison result after receiving the difference comparison result of the comparison file sent by the server with respect to the reference file, where the preset display area is an area other than a reference file display area and a comparison file display area, and the difference detail includes a difference type in each piece of difference information, a difference text in the reference file, and a difference text in the comparison file.

Optionally, the binding module is further configured to bind the ID with corresponding difference information in the difference details after the ID is bound with the position information of the DIV element of the difference text in the reference document and the position information of the DIV element of the difference text in the comparison document, respectively;

the display unit further includes:

the obtaining module is used for obtaining an ID bound with difference information in the difference details corresponding to a second synchronous positioning instruction when the second synchronous positioning instruction triggered based on the difference details is received;

and the second synchronization module is used for synchronously highlighting the differential texts at the position information of all DIV elements bound with the acquired ID.

Optionally, the receiving unit is further configured to receive a scroll instruction for a first scroll bar before highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, where the first scroll bar includes a scroll bar in a reference file display area or a scroll bar in a comparison file display area;

the determining unit is used for determining the proportion of the currently scrolled length of the first scroll bar in the total length of the scroll area according to the scroll instruction;

and the synchronous scrolling unit is used for scrolling a second scroll bar according to the proportion so as to synchronously scroll the first scroll bar and the second scroll bar, wherein the second scroll bar comprises a scroll bar in a reference file display area or a scroll bar in a comparison file display area, but is different from the first scroll bar.

Optionally, the display unit is configured to highlight, according to the difference comparison result, the difference text currently scrolled to the display area in the comparison file and/or the reference file.

Optionally, the sending unit includes:

the identification module is used for identifying the reference file and the comparison file by using Optical Character Recognition (OCR) to obtain at least one page of text of the reference file and at least one page of text of the comparison file;

the splicing module is used for splicing the multi-page texts of the target file into one page of text with continuous context to obtain the target text when the target file is a file containing the multi-page text, and acquiring the single page text from the target file as the target text when the target file is a file containing the single page text, wherein the target text is a reference text when the target file is the reference file, and the target text is a comparison text when the target file is the comparison file;

and the sending module is used for sending the reference text and the comparison text to the server.

In a third aspect, an embodiment of the present invention provides a computing device, where the computing device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to the first aspect.

According to the file comparison method, device, equipment and storage medium based on the RPA and AI, provided by the embodiment of the invention, the reference file and the comparison file to be compared can be automatically uploaded to the client by the RPA robot, the reference file and the comparison file are transmitted to the server by the client for difference comparison, and finally, the difference text can be highlighted in the comparison file and/or the reference file according to the difference comparison result returned by the server. Therefore, compared with the prior art that files need to be compared manually, the embodiment of the invention can utilize the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby saving manpower, ensuring that the personnel who originally need to compare the files have time to do more valuable work, and improving the efficiency of file comparison; compared with the difference needing to be marked manually in the prior art, the embodiment of the invention can directly highlight the difference text in the reference file and/or the comparison file, thereby improving the readability of the difference text and further improving the efficiency of searching the difference between the two files for a user. When the client sends the reference file and the comparison file to the server, the reference file and the comparison file can be identified by using an Optical Character Recognition (OCR), the files containing a plurality of pages of texts in the two files are subjected to text splicing to obtain a single-page continuous-context reference text and a single-page continuous-context comparison text, and finally the reference text and the comparison text are sent to the server for difference comparison, so that the server can directly compare the two texts in combination with the context without performing other processing by the server, and the efficiency and the accuracy of file comparison by the server can be improved.

In addition, the embodiment of the invention can also realize the technical effects that:

1. the user can trigger a synchronous positioning instruction through the reference file display area, the comparison file display area or the difference detail display area, so that the client synchronously and prominently displays the same difference information, and the efficiency of viewing the difference text by the user is improved.

2. The user can drag the reference file display area or compare the scroll bars of the file display area, so that the client synchronously scrolls aiming at the two display areas, and the text viewing efficiency of the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a document comparison method based on RPA and AI according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of a difference alignment result according to an embodiment of the present invention;

FIG. 3 is another exemplary graph showing the results of a differential alignment according to the present invention;

FIG. 4 is a block diagram of a document comparison apparatus based on RPA and AI according to an embodiment of the present invention;

FIG. 5 is a diagram of a file comparison system architecture based on RPA and AI according to an embodiment of the present invention;

fig. 6 is a diagram of another RPA and AI-based file alignment system architecture according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

In daily work, documents of different versions are often compared in a difference mode manually, the work is high in repeatability and low in difficulty, time is consumed, and further the requirement of a company for automatically comparing the documents is more and more urgent. The RPA (robot Process Automation) technology can intelligently understand the existing application of the electronic device through a user interface, automate repeated regular operations based on rules and large batches, such as automatically and repeatedly reading mails, reading Office components, operating databases, web pages, client software and the like, collect data and perform complicated calculation, and generate required files and reports in batches, so that the input of labor cost can be greatly reduced through the RPA technology, and the Office efficiency is effectively improved. The AI (Artificial Intelligence) technology can break through fixed rules and simulate human thinking and consciousness to automatically process more complex application scenarios. Based on this, the embodiment of the invention provides a method for automatically comparing files by combining two technologies of RPA and AI, so that not only can manpower be saved and the efficiency of comparing files be improved, but also the difference between the two files can be highlighted, and the efficiency of searching the file difference by a user is improved.

The following provides a detailed description of embodiments of the invention.

In the description of the embodiment of the present invention, the term "reference file" refers to a file that is used as a reference when performing differential comparison, and the "comparison file" refers to a file other than the reference in two compared files.

In the description of the embodiments of the present invention, the term "multi-page document" refers to a document having a text content greater than or equal to two pages, and the term "multi-page text" refers to a text content greater than or equal to two pages.

In the description of the embodiments of the present invention, the term "OCR" refers to Optical Character Recognition (Optical Character Recognition), and specifically refers to a process in which an electronic device checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software. In the embodiment of the present invention, based on the RPA robot, the OCR technology is utilized to convert the characters in the paper document into the image file of the black and white dot matrix, and then the client identifies the text content contained in the image file by utilizing the OCR technology, or based on the RPA robot, the OCR technology is utilized to acquire the text content from the paper document, and a text file (i.e. an editable file) containing the text content is generated, and then the client directly extracts the text content from the text file.

In the description of the embodiment of the present invention, the term "client" refers to a front end of a business system having file comparison requirements, and the term "server" refers to a back end of the business system having file comparison requirements. The "client" may be application software corresponding to the service system, or may also be a browser, so that the RPA robot accesses a website of the service system through the browser. The term "RPA robot" may be integrated in the client, may be embedded in the client in the form of a plug-in, or the like, or may be independent from the client, as long as the RPA robot can automatically access the client, and the embodiment of the present invention does not limit the specific form of the RPA robot.

In the description of the embodiments of the present invention, the term "NLP" refers to Natural Language Processing (Natural Language Processing), which takes a Language as an object, and utilizes computer technology to analyze, understand and process a subject of Natural Language, that is, taking a computer as a powerful tool for Language research, and making quantitative research on Language information with the support of the computer, and providing Language description that can be used by both human and computer.

In the description of the embodiments of the present invention, the term "splicing" refers to connecting together contents to be spliced without changing original contents. By splicing the multi-page texts, the multi-page text contents can be seamlessly connected on the basis of keeping the original text content arrangement sequence.

In the description of the embodiment of the present invention, the term "preset comparison algorithm" refers to a specific comparison method for determining a difference between a comparison text and a reference text, and the reference text and the comparison text may be compared in batches according to a preset comparison unit until the comparison is completed, and the specific comparison process may refer to the detailed description of S120. The term "preset comparison unit" refers to the size of the text to be compared each time, and may specifically be a phrase, a sentence, or a paragraph, etc., according to the actual situation.

In the description of embodiments of the present invention, the term "differential alignment" refers to which differences exist between the reference text and the aligned text. The term "difference comparison result" refers to a result obtained after a reference text and a comparison text are differentially compared, and includes at least one piece of difference information, where each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, and the difference position information includes a page identifier of a page to which the difference text belongs, and coordinate information of the page to which the difference text belongs. The term "difference type" is used to characterize the category of differences, which mainly includes content deletion, content addition, and content modification. The term "page identifier" is used to indicate that the current page is located at the page number of the entire file. With the term "coordinate information", a coordinate system may be established for each page, with the first character position of each page as an origin, and horizontal and vertical axes, respectively, to the right and downward, so that corresponding coordinates may be generated for each character in the page. The term "differential text" refers to text content in a current document that is different from another document.

In the description of the embodiments of the present invention, the term "highlight" is a display manner capable of clearly distinguishing the difference text from other text, and the highlight manner includes a combination of one or more of the following: bold font, change font color, increase font ground color, highlight font, enlarge font, change to italics, add underlining, add strikethrough, and the like.

In the description of the implementation of the present invention, the term "authentication" refers to verifying whether a client sending a reference file and a comparison file has a right to perform file comparison, and specifically, the authentication can be implemented by verifying whether user information of the client meets the right requirement.

In the description of the implementation of the present invention, the term "DIV element location information" refers to location information of a DIV (divsion) element on a network interface, and the term "DIV element" is used to provide a structure and a background for block-level (block-level) content in an HTML (Hyper Text Markup Language) document.

In the description of the implementation of the present invention, the term "binding" refers to establishing a mapping relationship between at least two parameters to be bound, so that one parameter can be found out of the other parameter.

In the description of the implementation of the present invention, the term "difference detail" is specific description information for each difference, and the difference detail includes a difference type in each piece of difference information, difference text in a reference file, and difference text in an alignment file.

In the description of the implementation of the present invention, the term "synchronous positioning instruction" is an instruction indicating that the difference texts related to the same piece of difference information are displayed synchronously.

In the description of the implementation of the present invention, the term "synchronous scrolling" refers to a scrolling manner in which the progress of scrolling of at least two scroll bars is kept consistent.

Fig. 1 is a file comparison method based on RPA and AI according to an embodiment of the present invention, where the method is mainly applied to a client, and specifically includes:

and S100, receiving a reference file and a comparison file uploaded by the RPA robot.

Specifically, the embodiment of the present invention may configure an RPA program (which may be integrated or embedded in the client, or may be independent from the client) in the electronic device capable of logging in the client, so that the electronic device may simulate a mouse and keyboard operation of a user to automatically log in the client according to a rule set in the RPA program, trigger the client to generate a file comparison request including a reference file and a comparison file by accessing the client, and send the file comparison request to the server, so that the server performs difference comparison on the reference file and the comparison file. When the client is logged in, the client can pop up a login interface containing the verification code image, under the condition, the RPA robot can perform OCR (optical character recognition) on the verification code image, obtain the verification code content in the verification code image and input the verification code content into a corresponding editing frame, so that the client is successfully logged in.

The reference file and the comparison file can be stored in the client, can also be stored in other storage spaces of the electronic equipment, and can also be paper files. When the reference file and the comparison file are stored in other storage spaces of the electronic device, the RPA robot may search the reference file and the comparison file from the other storage spaces, and upload the reference file and the comparison file to the client, for example, upload the two files to the client by clicking an upload button, or drag the two files to a designated area by a drag mode to upload the files, or may use other upload modes. When the reference file and/or the comparison file are/is a paper file, the RPA robot may convert the paper file into an image file or a text file (i.e., an editable file composed of text contents in the paper file) by using an OCR technology, and then upload the converted file to the client by using the above method.

After the client receives the reference file and the comparison file uploaded by the RPA, the reference file and the comparison file can be rendered so as to display the uploaded files to a user. Specifically, when the reference file and/or the comparison file is a word file, the word file may be converted into a PDF file, and then rendered by using a rendering library provided by the client, and when the PDF file is a multi-page file, multi-page rendering is performed; when the reference file and/or the comparison file are/is a picture file except for the tiff format, rendering can be performed by using a rendering library carried by the client; when the reference file and/or the comparison file are/is the picture file in the tiff format, a special rendering library in the tiff format can be used for rendering. When the word file is converted into the PDF file, the word file can be sent to the server by the client to execute the conversion operation, and then the PDF file is fed back to the client by the server to be rendered.

And S110, sending the reference file and the comparison file to a server.

After receiving the reference file and the comparison file uploaded by the RPA robot, the client can receive a file comparison instruction triggered by the RPA robot, then directly generate a file comparison request comprising the reference file and the comparison file according to the file comparison instruction, and send the file comparison request to the server, so that the server performs difference comparison on the reference file and the comparison file. However, after the server receives the reference file and the comparison file, the server often needs to recognize the texts in the two files first to perform the difference comparison, and if there are more clients sending file comparison requests to the server, the efficiency of the server in performing the file comparison is reduced. In order to reduce the burden of the server and improve the file comparison efficiency, the embodiment of the invention can identify the reference file and the comparison file by the client by using the OCR to obtain at least one page of text of the reference file and at least one page of text of the comparison file, and then send the identified texts to the server for difference comparison.

In practical applications, if a single-page comparison is directly performed between at least one page of text of a reference file and at least one page of text of a comparison file, that is, an nth page of the reference file is compared with an nth page of the comparison file, without paying attention to an association relationship between the pages, it is very easy for the comparison result to be inaccurate. For example, the reference file includes two pages of texts, and the comparison file adds one page of text between the first page of text and the second page of text of the reference file, so as to form three pages of texts, if the two files are compared by using a single-page comparison method, the comparison result is that the contents of the second page of text of the reference file are different from the contents of the second page of text of the comparison file, and the reference file does not have the third page of text, so that the third page of text of the comparison file does not exist in the reference file, that is, the single-page comparison method is used, which may cause the whole comparison result to be that the two files are different from each other except the first page of text.

In order to avoid the problem of inaccurate comparison results, in the embodiment of the invention, the reference file and the comparison file are identified by using the OCR at the client, at least one page of text of the reference file and at least one page of text of the comparison file are obtained, the texts are spliced first, and then the spliced texts are sent to the server. Specifically, when a target file is a file containing a plurality of pages of texts, the plurality of pages of texts of the target file are spliced into a page of text with continuous context to obtain a target text, when the target file is a file containing a single page of text, the single page of text is obtained from the target file as the target text, wherein the target file comprises a reference file or a comparison file, when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file, the target text is the comparison text; and sending the reference text and the comparison text to the server.

Wherein, the context is continuous, which means that the sequence of the original characters is kept. The specific method for splicing the multi-page texts of the reference file or the comparison file into the one-page text with continuous context may be to splice the multi-page texts in sequence according to the paging sequence of the reference file or the comparison file, so as to obtain the one-page text with continuous context.

It should be added that, in order to improve the communication security between the client and the server, the server may authenticate the user information of the client to verify whether the user has the file comparison authority. Specifically, the client may also carry user information of the client when sending the reference file and the comparison file to the server, so that the server authenticates the client according to the user information, and performs difference comparison on the reference file and the comparison file when determining that the authentication is passed. The user information can be a client account, a mobile phone number bound with the client account, a user level or other information, and the specific content of the user information is not limited by the implementation of the invention and can be determined according to specific conditions. There are many ways to authenticate the user information, including but not limited to the following two: (1) matching the user information with a user list with authority, if the matching is successful, determining that the user corresponding to the user information has the authority, namely the authentication is passed, and if the matching is failed, determining that the user corresponding to the user information has no authority, namely the authentication is failed; (2) and judging whether the user grade in the user information exceeds a preset grade or not, if so, passing the authentication, and if not, failing the authentication.

S120, receiving a difference comparison result of the comparison file sent by the server relative to the reference file.

After receiving the reference file and the comparison file, the server can perform difference comparison on the reference file and the comparison file according to a preset comparison algorithm. Specifically, the reference text and the comparison text may be compared according to a preset comparison unit, so as to obtain a comparison pair result for each preset comparison unit. In the process of comparing the reference text with the comparison text according to the preset comparison unit, if the contents of the reference sub-text being compared (the reference text of the preset comparison unit) and the comparison sub-text (the comparison text of the preset comparison unit) are determined to be the same, determining the corresponding comparison sub-result to be the same; if the reference sub-text being compared does not exist in the comparison text, determining the corresponding comparison sub-result as content deletion; and if the comparison pair sub-text which is being compared does not exist in the reference text, determining that the corresponding comparison pair sub-result is increased in content. In practical applications, the differences between the two texts should include content modification in addition to content identity, content deletion and content addition. Therefore, in order to enable a user to more intuitively see the difference of the comparison text relative to the reference text, aiming at a first comparison pair sub-result and a second comparison pair sub-result which are not adjacent, if the first comparison pair sub-result and the second comparison pair sub-result have the same content, and the comparison pair sub-result between the first comparison pair sub-result and the second comparison pair sub-result comprises content deletion and content addition, but does not comprise the same content, combining the comparison pair sub-result between the first comparison pair sub-result and the second comparison pair sub-result into a pair sub-result, and modifying the combined comparison pair sub-result into content. The size of the preset comparison unit may be determined according to actual conditions, and may be a phrase, a sentence, a paragraph, and the like.

It is added that when comparing two texts, in addition to simply judging whether the characters or words used by the text contents are the same or not, the NLP technology can be combined to perform semantic analysis on the reference sub-text and the comparison sub-text, and when the reference sub-text and the comparison sub-text have the same meaning but the used characters or words are different, it can be determined that the corresponding comparison sub-results are the same in content. In addition, the embodiment of the invention can also support the user-defined filtering rule, and ignore meaningless differences, namely when the difference between the reference sub-text and the comparison sub-text meets the preset filtering rule, the difference meeting the preset filtering rule is ignored. For example, it can be set that the presence or absence of the auxiliary word in a sentence does not affect the comparison result. When the server sends the difference comparison result to the client, the server can also send the ignored difference, so that the client can display the ignored difference to the user.

When the comparison sub-result is content addition, content deletion or content modification, a piece of difference information can be generated for the comparison sub-result, so that all difference information is fed back to the client after all difference information is obtained. Each piece of difference information comprises a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, wherein the difference position information comprises a page identifier of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs. One comparison pair result corresponds to one piece of difference information, and the difference types comprise content addition, content deletion and content modification. The page identifier is used to indicate that the current page is located at the page number of the entire file. For the coordinate information, a coordinate system may be established for each page, with the first character position of each page as an origin, and horizontal and vertical axes as horizontal and vertical directions as vertical and right, respectively, so that corresponding coordinates may be generated for each character in the page.

In one embodiment, after the client sends the reference file and the comparison file to the server, the server may add the task of comparing the reference file and the comparison file to the task queue, store the comparison task and the task state of the comparison task in the task database, and update the task state in the task database in time when the task state of the comparison task changes. The client can receive a comparison task state query instruction triggered by the RPA robot, and sends the comparison task state query instruction to the server, so that the server queries the task state of the comparison task corresponding to the comparison task state query instruction from the task database, and feeds the queried task state back to the client. The task state may be unprocessed when the comparison task is not executed, may be in-process when the comparison task is being executed, and may be completed when the comparison task is completed.

In addition, the server can actively feed back the difference comparison result to the client, and can also passively feed back the difference comparison result to the client. The specific implementation manner of passively feeding back the difference comparison result to the client may be: the client receives a comparison result query instruction triggered by the RPA robot, sends the comparison result query instruction to the server, and the server sends a corresponding difference comparison result to the client according to the comparison result query instruction.

The specific implementation manner of triggering the client to send the comparison result query instruction or the comparison task state query instruction by the RPA robot includes, but is not limited to, triggering the client to generate and send a corresponding instruction by the RPA robot by clicking a comparison result query button or a comparison task state query button on the client.

S130, highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result.

The highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file. The manner of highlighting includes, but is not limited to, a combination of one or more of the following: bold font, change font color, increase font ground color, highlight font, enlarge font, change to italics, add underlining, add strikethrough, and the like. When the difference comparison result includes a plurality of difference types, the different difference types may be highlighted in the same manner or in different manners.

The specific implementation manner of this step may be: converting the coordinate information into DIV element position information; when the DIV element position information enters the display area of the document, highlighting the difference text at the DIV element position information in the page indicated by the page identifier according to the difference type corresponding to the DIV element position information and the page identifier corresponding to the DIV element position information. Among them, DIV is a positioning technique in a cascading style sheet, and DIV elements are elements for providing a structure and a background for block-level (block-level) contents in an HTML document. For example, two display areas arranged from left to right can be packaged on an interface to respectively display the reference file and the comparison file, and when the reference file and/or the comparison file have more texts and cannot display the whole file at one time, the scrolling display function of the scroll bar can be increased.

It should be noted that, when the difference text is highlighted in the comparison file and/or the reference file according to the difference comparison result, if the difference type included in the current difference information is content deletion, the deleted text may be highlighted only in the reference file, or the text before content deletion and the text remaining after content deletion may be highlighted separately, that is, the difference text included in the reference file and the difference file in the comparison file in the difference information are both highlighted. If the difference type included in the current difference information is content increase, the increased content may be highlighted in the comparison file only, or the text before content increase and the text after content increase may be highlighted respectively, that is, both the difference text in the reference file and the difference file in the comparison file included in the difference information are highlighted. If the difference type included in the current difference information is content modification, the text before content modification and the text after content modification can be highlighted, that is, both the difference text in the reference file and the difference file in the comparison file included in the difference information are highlighted.

For example, fig. 2 is a partial text content of the reference file and the comparison file, the difference text may be directly highlighted in the reference file and the comparison file, and the user may browse by dragging the scroll bars of the reference file and the comparison file. The text with bold and underline refers to the modified text, the text with slant and large underline refers to the text added in the comparison file, and the text with deletion line refers to the text deleted in the comparison file.

In an embodiment, under the condition that independent display areas are respectively packaged for a reference file and a comparison file, when a user needs to compare and check a difference at a certain position, scroll bars of the two files need to be respectively dragged to check, and the operation is complicated. In order to improve the efficiency of the user in checking the difference, the embodiment of the present invention may generate an ID (Identity Document) for the same piece of difference information according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file, and the difference type, and respectively bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, and when a first synchronous positioning instruction triggered based on the reference file or the comparison file is received, synchronously highlight the difference text at all the DIV element position information bound with the ID corresponding to the first synchronous positioning instruction.

Specific implementation manners of generating the ID include, but are not limited to: and splicing the DIV element position information of the different texts in the reference file, the DIV element position information of the different texts in the comparison file and the difference type according to a preset sequence to obtain a character string. Wherein different difference types can be represented by using different characters, for example, "content delete", "content add", and "content modify" can be represented by using "1", "2", and "3" in sequence. The first synchronous positioning instruction is an instruction generated when the user clicks a display area of the reference file or the comparison file. When the client receives the first synchronous positioning instruction, the corresponding ID is activated. For the reference document or the component corresponding to the comparison document, whether the activated ID is the same as the ID contained by the reference document or the comparison document is judged, if the activated ID is the same as the ID contained by the reference document or the comparison document, the differential text at the position information of the DIV element corresponding to the ID can be highlighted, and when the position information of the DIV element is not in the display area, the position information of the DIV element is scrolled to the display area to be displayed. For example, for the same piece of difference information, when the difference text "electronic device" in the reference file is located on page 2 of the reference file and the difference text "computing terminal" in the comparison file is located on page 3 of the comparison file, the user clicks on the text "electronic device" on page 2 of the reference file, and the client automatically synchronizes, so that the comparison file automatically scrolls to page 3 and the text "computing terminal" is highlighted.

In an embodiment, in order to provide a user with more ways to view differences and facilitate the user to view according to personal habits, in the embodiment of the present invention, a difference detail may be displayed in a preset display area according to the difference comparison result, where the preset display area is an area other than a reference file display area and a comparison file display area, and the difference detail includes a difference type in each piece of difference information, a difference text in the reference file, and a difference text in the comparison file. In addition, the difference comparison results can be summarized, and the summarized results are displayed in display areas except for the reference file display area, the comparison file display area and the preset display area. The summary result includes the total number of the difference information and the page identifier where the difference information is located. For example, the summary result is "after alignment, there are differences at pages 1, 3, 5, and 8 of the reference document, and there are 20 differences in both documents".

For example, as shown in fig. 3, when displaying the difference, the client not only highlights the difference text in the reference file and/or the comparison file, but also displays the comparison result on the right side. The top half of the alignment is the overall alignment (i.e., the summary), and the bottom half is the detailed alignment (i.e., the difference). The user can browse the difference details by dragging the scroll bar of the detailed comparison result display area.

Because the preset display area is independent of the reference file display area and the comparison file display area, when a user views the preset display area, the contents displayed in the reference file display area and the comparison file display area cannot be changed. In this case, if the user wants to check specific contents in the reference file and the comparison file in combination with the difference details, the user needs to drag the scroll bars in the reference file display area and the comparison file display area to implement the check, which is complicated to operate. In order to improve the efficiency of the user in checking the difference based on the difference detail, the embodiment of the present invention may bind the ID with the corresponding difference information in the difference detail; when a second synchronous positioning instruction triggered based on the difference detail is received, acquiring an ID bound with difference information in the difference detail corresponding to the second synchronous positioning instruction; the differential text at all the DIV element position information bound with the acquired ID is highlighted in synchronization.

And the second synchronous positioning instruction is an instruction generated when the user clicks the preset display area. And when the client receives the second synchronous positioning instruction, the ID bound by the difference information in the difference detail corresponding to the second synchronous positioning instruction is activated. For the reference document or the component corresponding to the comparison document, whether the activated ID is the same as the ID contained by the reference document or the comparison document is judged, if the activated ID is the same as the ID contained by the reference document or the comparison document, the differential text at the position information of the DIV element corresponding to the ID can be highlighted, and when the position information of the DIV element is not in the display area, the position information of the DIV element is scrolled to the display area to be displayed.

In an embodiment, no matter before or after comparing the two files, when a user views the two files, the scroll bars in the display areas of the two files need to be dragged respectively to realize synchronous viewing of the two files, and the operation is complicated. In order to improve the efficiency of the user in consulting the two files, the embodiment of the invention can receive a scroll instruction aiming at the first scroll bar; determining the proportion of the currently scrolled length of the first scroll bar in the total length of a scroll area according to the scroll instruction; scrolling a second scrollbar according to the ratio such that the first scrollbar scrolls in synchronization with the second scrollbar. That is, for the first scroll bar, scrolling is performed only following the user's drag without performing synchronous scrolling, and for the second scroll bar, scrolling is performed along with the scrolling of the first scroll bar.

The first scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area, and the second scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area, but is different from the first scroll bar. That is, when the first scroll bar is a scroll bar referencing the display area of the file, the second scroll bar is a scroll bar comparing the display area of the file; when the first scroll bar is a scroll bar for comparing the display areas of the files, the second scroll bar is a scroll bar for referring to the display areas of the files.

For example, when the user scrolls the scroll bar in the reference file display area, the client calculates, in real time, a ratio of a currently scrolled length of the scroll bar in the reference file display area to a total length of the scroll area (for example, the currently scrolled length is 2cm, the total length of the scroll area is 10cm, and the ratio is 0.2), and scrolls the scroll bar in the comparison file display area to a ratio of 0.2 according to the ratio (for example, the currently scrolled length is 3cm, and the total length of the scroll area is 12cm, the scroll bar is scrolled to a ratio of 2.4 cm).

Before the difference comparison is carried out, if the user triggers synchronous scrolling, the text after synchronous scrolling can be directly displayed. After the difference comparison is performed, if the user triggers synchronous scrolling, the difference text currently scrolled to the display area can be highlighted in the comparison file and/or the reference file according to the difference comparison result, and for other texts currently scrolled to the display area, the other texts are displayed conventionally without highlighting.

According to the file comparison method based on the RPA and the AI, provided by the embodiment of the invention, the reference file and the comparison file to be compared can be automatically uploaded to the client by the RPA robot, the reference file and the comparison file are transmitted to the server by the client for difference comparison, and finally, a difference text can be highlighted in the comparison file and/or the reference file according to a difference comparison result returned by the server. Therefore, compared with the prior art that files need to be compared manually, the embodiment of the invention can utilize the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby saving manpower, ensuring that the personnel who originally need to compare the files have time to do more valuable work, and improving the efficiency of file comparison; compared with the difference needing to be marked manually in the prior art, the embodiment of the invention can directly highlight the difference text in the reference file and/or the comparison file, thereby improving the readability of the difference text and further improving the efficiency of searching the difference between the two files for a user. When the client sends the reference file and the comparison file to the server, the reference file and the comparison file can be identified by using an Optical Character Recognition (OCR), the files containing a plurality of pages of texts in the two files are subjected to text splicing to obtain a single-page continuous-context reference text and a single-page continuous-context comparison text, and finally the reference text and the comparison text are sent to the server for difference comparison, so that the server can directly compare the two texts in combination with the context without performing other processing by the server, and the efficiency and the accuracy of file comparison by the server can be improved.

Based on the foregoing method embodiment, another embodiment of the present invention further provides a file comparison apparatus based on RPA and AI, where the apparatus is applied to a client, as shown in fig. 4, and the apparatus includes:

the receiving unit 20 is configured to receive a reference file and a comparison file uploaded by the robot process automation RPA robot;

a sending unit 22, configured to send the reference file and the comparison file to a server;

the receiving unit 20 is further configured to receive a difference comparison result of the comparison file sent by the server with respect to the reference file;

a display unit 24, configured to highlight a difference text in the comparison file and/or the reference file according to the difference comparison result, where the highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file.

Optionally, the display unit 24 includes:

Optionally, the display unit 24 further includes:

Optionally, the display unit 24 is further configured to, after receiving a difference comparison result of the comparison file sent by the server with respect to the reference file, display a difference detail in a preset display area according to the difference comparison result, where the preset display area is an area other than a reference file display area and a comparison file display area, and the difference detail includes a difference type in each piece of difference information, a difference text in the reference file, and a difference text in the comparison file.

the display unit 24 further includes:

Optionally, the receiving unit 20 is further configured to receive a scroll instruction for a first scroll bar before highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, where the first scroll bar includes a scroll bar of a reference file display area or a scroll bar of a comparison file display area;

Optionally, the sending unit 22 includes:

Based on the above embodiment, another embodiment of the present invention further provides a computing device, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method according to any one of the embodiments of the invention. Wherein the processor is coupled to the storage device.

Based on the above method embodiments, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the method according to any embodiment of the present invention.

Based on the above embodiments, the embodiment of the present invention further provides a file comparison system based on RPA and AI, where the system includes an RPA robot 30, a client 32, and a server 34. As shown in fig. 5, the RPA robot 30 may be independent of the client 32, and as shown in fig. 6, the RPA robot 30 may be part of the client 32.

The RPA robot 30 is configured to log in the client 32, upload a reference file and a comparison file to the client 32, and trigger the client 32 to send the reference file and the comparison file to the server 34 for differential comparison;

the client 32 is configured to receive a reference file and a comparison file uploaded by the RPA robot, and send the reference file and the comparison file to a server;

the server 34 is configured to perform difference comparison on the reference file and the comparison file according to a preset comparison algorithm, so as to obtain a difference comparison result of the comparison file with respect to the reference file;

the client 32 is further configured to receive a difference comparison result sent by the server, and highlight a difference text in the comparison file and/or the reference file according to the difference comparison result, where the highlighted difference text in the comparison file is a text in which the comparison file is different from the reference file, and the highlighted difference text in the reference file is a text in which the reference file is different from the comparison file.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A file comparison method based on RPA and AI is applied to a client, and is characterized in that the method comprises the following steps:

s2, sending the reference file and the comparison file to a server;

2. The method according to claim 1, wherein the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference location information of the difference text in the reference file, and difference location information of the difference text in the comparison file, and the difference location information includes a page identifier of a page to which the difference text belongs, and coordinate information of the page to which the difference text belongs.

3. The method according to claim 2, wherein the S4 includes:

4. The method according to claim 3, wherein the S4 further comprises:

5. The method according to claim 4, wherein after the S3, the method further comprises:

6. The method according to claim 5, wherein after the S43, the method further comprises:

7. The method according to claim 1, wherein before the S4, the method further comprises:

8. The method according to claim 7, wherein the S4 includes:

9. The method according to any one of claims 1-8, wherein the S2 includes:

and S23, sending the reference text and the comparison text to the server.

10. A file comparison device based on RPA and AI, the device is applied to the client, the device is characterized in that:

11. The apparatus of claim 10, wherein the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference location information of the difference text in the reference file, and difference location information of the difference text in the comparison file, and the difference location information includes a page identifier of a page to which the difference text belongs, and coordinate information of the page to which the difference text belongs.

12. The apparatus of claim 11, wherein the display unit comprises:

13. The apparatus according to any one of claims 10-12, wherein the sending unit comprises:

14. A computing device, wherein the computing device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.

15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.