WO2023045053A1 - File comparison method and apparatus based on rpa and ai, device, and storage medium - Google Patents

File comparison method and apparatus based on rpa and ai, device, and storage medium Download PDF

Info

Publication number
WO2023045053A1
WO2023045053A1 PCT/CN2021/131627 CN2021131627W WO2023045053A1 WO 2023045053 A1 WO2023045053 A1 WO 2023045053A1 CN 2021131627 W CN2021131627 W CN 2021131627W WO 2023045053 A1 WO2023045053 A1 WO 2023045053A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
text
difference
comparison
position information
Prior art date
Application number
PCT/CN2021/131627
Other languages
French (fr)
Chinese (zh)
Inventor
赵鹏
汪冠春
胡一川
褚瑞
李玮
Original Assignee
北京来也网络科技有限公司
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京来也网络科技有限公司, 来也科技(北京)有限公司 filed Critical 北京来也网络科技有限公司
Publication of WO2023045053A1 publication Critical patent/WO2023045053A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • Embodiments of the present invention relate to the technical field of process automation, and in particular, relate to a method, device, device, and storage medium for comparing files based on RPA and AI.
  • RPA Robot Process Automation, robotic process automation
  • Robot Process Automation is to simulate the operation of human beings on the computer through specific "robot software", and automatically execute process tasks according to the rules.
  • AI Artificial Intelligence
  • RPA has unique advantages: low-code, non-intrusive.
  • Low-code means that RPA can be operated without a high IT level, and business personnel who do not understand programming can also develop processes; non-intrusive means that RPA can simulate human operations without opening interfaces with software systems.
  • traditional RPA has certain limitations: it can only be based on fixed rules, and its application scenarios are limited. With the continuous development of AI technology, the deep integration of RPA and AI overcomes the limitations of traditional RPA.
  • RPA+AI Hand work+Head work is greatly changing the value of labor.
  • Embodiments of the present invention provide a file comparison method, device, device, and storage medium based on RPA and AI, which can not only realize the automation of file comparison, but also highlight the differences between two files, thereby improving the user's search for file differences. s efficiency.
  • the implementation of the present invention provides a method for comparing files based on RPA and AI, the method is applied to the client, and the method includes:
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the S4 includes:
  • the S4 also includes:
  • the DIV element position information and the difference type of the difference text in the comparison file generate an ID card identification number ID, and respectively Binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
  • the method further includes:
  • the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include each piece of difference information
  • the type of difference in the text of the difference in the reference file, the text of the difference in the comparison file.
  • the method further includes:
  • the method further includes:
  • the S4 includes:
  • the difference text currently scrolled to the display area is highlighted in the comparison file and/or the reference file.
  • the S2 includes:
  • OCR optical character recognition
  • the target file is a file containing multiple pages of text
  • the target file is a file containing a single page of text
  • the embodiment of the present invention provides a file comparison device based on RPA and AI, the device is applied to the client, and the device includes:
  • a receiving unit configured to receive reference files and comparison files uploaded by robotic process automation RPA robots;
  • a sending unit configured to send the reference file and the comparison file to a server
  • the receiving unit is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
  • a display unit configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the comparison
  • the highlighted text of the difference in the reference file is the text that is different between the reference file and the comparison file.
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the display unit includes:
  • a conversion module configured to convert the coordinate information into position information of divided DIV elements
  • a display module configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
  • the display unit also includes:
  • a generation module for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
  • a binding module used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
  • the first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
  • the display unit is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, display in preset according to the difference comparison result
  • the area displays the difference details.
  • the preset display area is an area other than the reference file display area and the comparison file display area.
  • the difference details include the difference type in each piece of difference information, and the difference text in the reference file , the difference text in the comparison file.
  • the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
  • the display unit also includes:
  • An acquiring module configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
  • the second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
  • the receiving unit is further configured to receive scrolling for the first scroll bar before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result Instructions, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
  • a determining unit configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction
  • a synchronous scrolling unit configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
  • the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
  • the sending unit includes:
  • a recognition module configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
  • OCR optical character recognition
  • the splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text.
  • the target file is a file containing a single page of text file
  • a sending module configured to send the reference text and the comparison text to the server.
  • an embodiment of the present invention provides a computing device, and the computing device includes:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in the first aspect.
  • an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect is implemented.
  • the RPA and AI-based file comparison method, device, equipment, and storage medium provided by the embodiments of the present invention can automatically upload the reference file and the comparison file to the client by the RPA robot, and the reference file can be uploaded by the client
  • the comparison file and the comparison file are transmitted to the server for difference comparison, and finally the difference text can be highlighted in the comparison file and/or reference file according to the difference comparison result returned by the server.
  • the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved.
  • the client when it sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text
  • OCR Optical Character Recognition
  • the two files contain multiple pages of text
  • the documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context.
  • the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context.
  • the two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
  • the user can trigger a synchronous positioning command through the reference file display area, the comparison file display area or the difference detail display area, so that the client can simultaneously highlight the same difference information, thereby improving the efficiency of users viewing the difference text.
  • the user can drag the scroll bar of the reference file display area or the comparison file display area to make the client scroll synchronously for these two display areas, thereby improving the efficiency of users viewing text.
  • Fig. 1 is the flowchart of a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention
  • Fig. 2 is an example diagram showing a difference comparison result provided by an embodiment of the present invention.
  • Fig. 3 is another example diagram showing difference comparison results provided by the embodiment of the present invention.
  • FIG. 4 is a block diagram of a file comparison device based on RPA and AI provided by an embodiment of the present invention
  • Fig. 5 is a kind of file comparison system architecture diagram based on RPA and AI provided by the embodiment of the present invention.
  • Fig. 6 is an architecture diagram of another file comparison system based on RPA and AI provided by an embodiment of the present invention.
  • the RPA Robot Process Automation
  • AI Artificial Intelligence, artificial intelligence
  • the embodiment of the present invention provides a combination of RPA and AI technologies to automatically compare files, which not only saves manpower, improves the efficiency of file comparison, but also highlights the differences between the two files, improving the Efficiency for users to find differences in files.
  • reference file refers to a file that is used as a reference when performing a difference comparison
  • comparison file refers to a file that is used as a reference in the two files that are compared.
  • version of the reference document is often lower than that of the comparison document.
  • the reference document and comparison document can be documents in any field, such as contract documents, financial documents, program documents, etc.
  • multi-page document refers to a document with text content greater than or equal to two pages
  • multi-page text refers to a document greater than or equal to two pages
  • OCR refers to Optical Character Recognition (Optical Character Recognition), specifically refers to the electronic equipment to check the characters printed on the paper, determine its shape by detecting dark and bright patterns, and then use the character Recognition method
  • Optical Character Recognition Optical Character Recognition
  • the RPA robot can use OCR technology to convert the text in the paper document into a black and white dot matrix image file, and then the client can use OCR technology to identify the text content contained in the image file from the image file , it is also possible to use OCR technology to obtain text content from paper documents based on RPA robots, generate a text file (that is, an editable file) containing text content, and then the client directly extracts text content from the text file.
  • OCR technology to convert the text in the paper document into a black and white dot matrix image file
  • the client can use OCR technology to identify the text content contained in the image file from the image file
  • OCR technology it is also possible to use OCR technology to obtain text content from paper documents based on RPA robots, generate a text file (that is, an editable file) containing text content, and then the client directly extracts text content from the text file.
  • client refers to the front-end of the business system with file comparison requirements
  • server refers to the back-end of the business system with file comparison requirements
  • client can be the application software corresponding to the business system, or it can be a browser, so that the RPA robot can access the website of the business system through the browser.
  • RPA robot can be integrated in the client, can also be embedded in the client in the form of a plug-in, or can be independent of the client, as long as the RPA robot can automatically access the client. The specific form is not limited.
  • NLP refers to natural language processing (Natural Language Processing), which takes language as an object and uses computer technology to analyze, understand and process natural language, that is, a computer is used as A powerful tool for language research. With the support of computers, it conducts quantitative research on language information and provides language descriptions that can be used jointly by humans and computers.
  • splicing refers to connecting the content to be spliced together without changing the original content.
  • the term "preset comparison algorithm” refers to a specific comparison method for determining the difference between the comparison text and the reference text, and the reference text and the comparison text can be compared according to the preset comparison unit The comparison is performed in batches until the comparison is completed.
  • the term “preset comparison unit” refers to the size of the text to be compared each time, which may be determined according to the actual situation, and may be a phrase, a sentence, or a paragraph.
  • the term “difference comparison” refers to comparing the differences between the reference text and the comparison text.
  • the term “difference comparison result” refers to a result including at least one piece of difference information obtained after making a difference comparison between the reference text and the comparison text, and each piece of difference information includes the type of difference, the difference in the reference file Text, difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, the difference position information includes difference The page identification of the page to which the text belongs, and the coordinate information of the page to which the difference text belongs.
  • the term “difference type” is used to characterize the category of differences, mainly including content deletion, content addition and content modification.
  • page identification is used to indicate which page the current page is located in the entire file.
  • coordinate information a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal axis and vertical axis being horizontally right and vertically downward respectively, so that Generate corresponding coordinates for each character in the page.
  • difference text refers to the text content in the current file that is different from another file.
  • highlighting is a display method that can clearly distinguish the difference text from other texts.
  • the highlighting method includes but is not limited to a combination of one or more of the following: bold font , Change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc.
  • authentication refers to verifying whether the client that sends the reference file and the comparison file has the authority to perform file comparison, specifically, it can be realized by verifying whether the user information of the client meets the authority requirement authentication.
  • DIV element position information refers to the position information of the DIV (DIVision, division) element in the network interface
  • DIV element is used for HTML (Hyper Text Markup Language, hypertext markup language)
  • HTML Hyper Text Markup Language, hypertext markup language
  • binding refers to establishing a mapping relationship of at least two parameters to be bound, so that one parameter can be used to find another parameter.
  • difference details is specific descriptive information for each difference, and the difference details include the difference type in each piece of difference information, the difference text in the reference file, and the difference text in the comparison file. diff text.
  • synchronous positioning instruction is an instruction indicating to display the difference text involved in the same piece of difference information synchronously.
  • synchronous scrolling refers to a scrolling manner that keeps the scrolling progress of at least two scroll bars consistent.
  • Fig. 1 is a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention, this method is mainly applied to the client, specifically includes:
  • the embodiment of the present invention can configure the RPA program in the electronic device that can log in to the client (it can be integrated or embedded in the client, or it can be independent of the client), so that the electronic device can simulate the RPA program according to the rules set in the RPA program
  • the user's mouse and keyboard operations automatically log in to the client, and trigger the client to generate a file comparison request including reference files and comparison files by accessing the client, and send the file comparison request to the server so that the server can compare the reference files and comparison files Compare differences.
  • the client when logging in to the client, the client can pop up a login interface containing a verification code image.
  • the RPA robot can perform OCR recognition on the verification code image, obtain the verification code content in the verification code image, and store the verification code content Enter it into the corresponding edit box to successfully log in to the client.
  • the reference file and comparison file can be stored in the client, or in other storage space of the electronic device, or can be a paper file.
  • the RPA robot can search for reference files and comparison files from the other storage spaces, and upload the reference files and comparison files to the client, for example, by clicking the upload button.
  • the two files are uploaded to the client, and the two files can also be dragged to the designated area by dragging and dropping to realize the file upload, or other upload methods can be used.
  • the RPA robot can use OCR technology to convert the paper file into an image file or a text file (that is, an editable file composed of the text content in the paper file) , and then use the above method to upload to the client.
  • the client After the client receives the reference file and comparison file uploaded by RPA, it can render the reference file and comparison file to show the uploaded file to the user.
  • the reference file and/or comparison file is a word file
  • the client when converting a word file into a PDF file, the client may send the conversion operation to the server, and then the server may feed back the PDF file to the client for rendering.
  • the client After receiving the reference file and comparison file uploaded by the RPA robot, the client can receive the file comparison command triggered by the RPA robot, and then directly generate a file comparison request including the reference file and the comparison file according to the file comparison command, and send it to the server Send the file comparison request so that the server can compare the differences between the reference file and the comparison file.
  • the server After the server receives the reference file and the comparison file, it often needs to identify the text in the two files before performing a difference comparison. If there are many clients sending file comparison requests to the server, it will cause the server to perform The efficiency of file comparison is reduced.
  • the client can first use OCR to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of the comparison document. text, and then send the recognized text to the server for difference comparison.
  • the reference document includes two pages of text, and the comparison document adds a page of text between the first page of text and the second page of text in the reference document, thereby forming three pages of text.
  • the single-page comparison method is used to compare the two
  • the comparison result is that the text on the second page of the reference file is different from the text on the second page of the comparison file, and the reference file does not have the text on the third page, so the comparison result is the text on the third page of the comparison file. It does not exist in the reference documents, that is to say, the single-page comparison method will lead to the overall comparison result that the two documents are not the same except for the same text on the first page.
  • the client uses OCR to identify the reference file and the comparison file, and obtains at least one page of text in the reference file and at least one page of text in the comparison file, first Perform text splicing, and then send the spliced text to the server.
  • the target file when the target file is a file containing multiple pages of text, the multiple pages of text of the target file are spliced into one page of text with continuous context to obtain the target text; when the target file is a file containing a single page of text , obtaining a single page of text from the target file as the target text, wherein the target file includes a reference file or a comparison file, when the target file is the reference file, the target text is a reference text, when When the target file is the comparison file, the target text is the comparison text; sending the reference text and the comparison text to the server.
  • the continuous context refers to maintaining the sequence of the original text.
  • the specific method of splicing multiple pages of text in a reference file or a comparison file into one page of text with continuous context can be to splice multiple pages of text in sequence according to the page order of the reference file or comparison file, so as to obtain a page with continuous context text.
  • the server can authenticate the user information of the client to verify whether the user has the file comparison authority.
  • the client sends the reference file and the comparison file to the server, it can also carry the user information of the client, so that the server can first authenticate the client according to the user information, and then verify the reference file and the comparison file when the authentication is passed. Compare files for difference comparison.
  • the user information may be a client account, may be a mobile phone number bound to the client account, may also be a user level or other information, and the implementation of the present invention does not limit the specific content of the user information, which may be determined according to specific circumstances.
  • the server can compare the difference between the reference file and the comparison file according to a preset comparison algorithm. Specifically, the reference text and the comparison text may be compared according to a preset comparison unit, and comparison sub-results for each preset comparison unit are obtained.
  • the corresponding comparison sub-result is determined as the content Delete; if it is determined that the comparison sub-text being compared does not exist in the reference text, then determine the corresponding comparison sub-result as content addition.
  • the differences between two texts should include not only the same content, content deletion and content addition, but also content modification.
  • the size of the preset comparison unit can be determined according to the actual situation, and can be a phrase, a sentence, a paragraph, and the like.
  • NLP technology can also be used to perform semantic analysis on the reference subtext and the comparison subtext.
  • the embodiment of the present invention can also support self-defined filtering rules, ignoring meaningless differences, that is, when there is a difference that satisfies the preset filtering rules in the difference between the reference subtext and the comparison subtext, ignore the difference that satisfies the preset filtering rules.
  • Set the difference in filtering rules For example, it can be set that the presence or absence of the particle "of" in a sentence does not affect the comparison result.
  • the server sends the difference comparison result to the client, it can also send the ignored difference, so that the client can display the ignored difference to the user.
  • each piece of difference information includes the difference type, the difference text in the reference file, the difference text in the comparison file, the difference position information of the difference text in the reference file, and the difference text in the comparison file.
  • the difference position information includes the page identification of the page to which the difference text belongs and the coordinate information of the difference text on the page to which it belongs.
  • a comparison sub-result corresponds to a piece of difference information, and the difference types include content addition, content deletion, and content modification.
  • the pagination mark is used to indicate which page the current paging is located in the entire file.
  • a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal and vertical axes to the right and vertical respectively, so that the Each character generates corresponding coordinates.
  • the server can add the task of comparing the reference file and the comparison file to the task queue, and compare the task with the comparison task.
  • the task state of the paired task is stored in the task database, and when the task state of the comparison task changes, the task state in the task database is updated in time.
  • the client can receive the comparison task status query command triggered by the RPA robot, and send the comparison task status query command to the server, so that the server can query the comparison task corresponding to the comparison task status query command from the task database , and feed back the queried task status to the client.
  • the comparison task when the comparison task is not being executed, the task status may be unprocessed, when the comparison task is being executed, the task status may be processing, and when the comparison task is completed, the task status may be completed.
  • the server can actively feed back the difference comparison result to the client, and can also passively feed back the difference comparison result to the client.
  • the specific implementation method of passively feeding back the difference comparison results to the client can be as follows: the client receives the comparison result query command triggered by the RPA robot, and sends the comparison result query command to the server, and the server queries the command according to the comparison result Send the corresponding difference comparison results to the client.
  • the specific implementation of the RPA robot triggering the client to send a comparison result query command or a comparison task status query command includes, but is not limited to, the RPA robot clicks the comparison result query button or the comparison task status query button on the client. Trigger the client to generate and send corresponding instructions.
  • the highlighted difference text in the comparison file is the text of the difference between the comparison file and the reference file
  • the highlighted difference text in the reference file is the text of the difference between the reference file and the reference file.
  • the way of highlighting includes but is not limited to a combination of one or more of the following: bold font, change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc.
  • different difference types may be highlighted in the same or different ways.
  • the specific implementation of this step may be: converting the coordinate information into DIV element position information; when the DIV element position information enters the display area of the file to which it belongs, according to the difference type corresponding to the DIV element position information and the The page identification corresponding to the position information of the DIV element highlights the difference text at the position information of the DIV element in the page indicated by the page identification.
  • DIV is a positioning technology in cascading style sheets
  • the DIV element is an element used to provide structure and background for block-level content in an HTML document.
  • the difference text is highlighted in the comparison file and/or the reference file according to the difference comparison result
  • the difference type contained in the current difference information is content deletion
  • it can be Only the deleted text is highlighted in the reference file, or the text before the content deletion and the text retained after the content deletion can be highlighted separately, that is, the difference text contained in the difference information in the reference file and in the comparison file
  • the difference files are highlighted. If the difference type contained in the current difference information is content increase, you can only highlight the added content in the comparison file, or you can highlight the text before the content increase and the text after the content increase respectively, that is, the difference information includes Both the difference text in the reference file and the difference files in the comparison file are highlighted. If the difference type contained in the current difference information is content modification, the text before content modification and the text after content modification can be highlighted, that is, the difference text contained in the difference information in the reference file and the text in the comparison file Difference files are highlighted.
  • FIG. 2 is part of the text content of the reference file and the comparison file, and the difference text can be directly highlighted in the reference file and the comparison file, and the user can browse by dragging the scroll bar of the reference file and the comparison file.
  • the bold and underlined text refers to the modified text
  • the italic and enlarged text refers to the text added in the comparison file
  • the strikethrough text refers to the deleted text in the comparison file.
  • the embodiment of the present invention can focus on the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file, and the difference Type, generate ID (Identity Document, ID card identification number), and carry out described ID and the DIV element position information of difference text in described reference file, the DIV element position information of difference text in described comparison file respectively Binding, when receiving the first synchronous positioning instruction triggered based on the reference file or the comparison file, bind the ID corresponding to the first synchronous positioning instruction to the difference text at the position information of all DIV elements Highlighting is done synchronously.
  • the specific implementation of generating the ID includes but is not limited to: according to the preset order, the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type Perform concatenation to obtain a string.
  • different difference types may be represented by different characters, for example, "content deletion", “content addition” and “content modification” may be represented by "1", "2" and "3” in sequence.
  • the first synchronous positioning instruction is an instruction generated when the user clicks on the display area of the reference file or the comparison file. When the client receives the first synchronous positioning instruction, it will activate the corresponding ID.
  • the component corresponding to the reference file or the comparison file it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted display, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display.
  • the embodiment of the present invention can also display the difference details in the preset display area according to the difference comparison results, so
  • the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include the difference type in each difference information, the difference text in the reference file, and the difference text in the comparison file.
  • the diff text in the file can also be summarized, and the summary results can be displayed in display areas other than the reference file display area, the comparison file display area, and the preset display area.
  • the summary result includes the total number of difference information and the page identification where the difference information is located. For example, the summary result is "After comparison, it is found that there are differences on pages 1, 3, 5, and 8 of the reference document, and there are 20 differences in total between the two documents".
  • the client when displaying the difference, not only highlights the difference text in the reference file and/or the comparison file, but also displays the comparison result on the right.
  • the upper part of the comparison result is the overall comparison result (ie, the summary result), and the lower part is the detailed comparison result (ie, the difference details). Users can browse the difference details by dragging the scroll bar in the detailed comparison result display area.
  • the preset display area is independent from the reference file display area and the comparison file display area, when the user views the preset display area, the contents displayed in the reference file display area and the comparison file display area will not change. In this case, if the user wants to view the specific content in the reference file and the comparison file in combination with the difference details, the user needs to drag the scroll bars of the reference file display area and the comparison file display area respectively, and the operation is cumbersome .
  • the embodiment of the present invention can bind the ID with the corresponding difference information in the difference details; when receiving the second synchronous positioning instruction triggered based on the difference details Obtaining the IDs bound to the difference information in the difference details corresponding to the second synchronous positioning instruction; synchronously highlighting the difference texts at all DIV element position information bound to the obtained IDs.
  • the second synchronous positioning instruction is an instruction generated when the user clicks on a preset display area.
  • the client When the client receives the second synchronous positioning command, it will activate the ID bound to the difference information in the difference details corresponding to the second synchronous positioning command.
  • the component corresponding to the reference file or comparison file it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted is displayed, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display.
  • the embodiment of the present invention may receive a scrolling command for the first scroll bar; determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scrolling area according to the scrolling command; Scrolling the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously. That is to say, for the first scroll bar, it will only scroll along with the user's dragging, but no synchronous scrolling will be performed; for the second scroll bar, it will scroll along with the scrolling of the first scroll bar.
  • the first scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area
  • the second scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area, but is different from the The first scroll bar is different. That is to say, when the first scroll bar is the scroll bar of the reference file display area, the second scroll bar is the scroll bar of the comparison file display area; when the first scroll bar is the scroll bar of the comparison file display area, the second scroll bar The bar is the scroll bar of the reference file display area.
  • the client when the user scrolls the scroll bar of the reference file display area, the client will calculate in real time the ratio of the current scrolled length of the reference file display area to the total length of the scroll area (for example, the current scrolled length is 2cm, scrolling The total length of the region is 10cm, and the ratio is 0.2), and according to this ratio, scroll the scroll bar of the comparison file display region to the ratio of 0.2 (such as the current scrolled length is 3cm, and the total length of the scrolling region is 12cm, it will scroll to 2.4cm).
  • the text after synchronous scrolling can be directly displayed.
  • the difference text currently scrolled to the display area can be highlighted in the comparison file and/or the reference file according to the difference comparison result, for Other text that is currently scrolled into the display area can be displayed normally without highlighting.
  • the file comparison method based on RPA and AI provided by the embodiment of the present invention can automatically upload the reference file and the comparison file to be compared by the RPA robot to the client, and the client can transmit the reference file and the comparison file to the server Perform a difference comparison, and finally highlight the difference text in the comparison file and/or reference file according to the difference comparison result returned by the server.
  • the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved.
  • the client when it sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text
  • OCR Optical Character Recognition
  • the two files contain multiple pages of text
  • the documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context.
  • the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context.
  • the two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
  • another embodiment of the present invention also provides a file comparison device based on RPA and AI, the device is applied to the client, as shown in Figure 4, the device includes:
  • the receiving unit 20 is used to receive the reference file and comparison file uploaded by the robotic process automation RPA robot;
  • a sending unit 22 configured to send the reference file and the comparison file to a server
  • the receiving unit 20 is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
  • the display unit 24 is configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the The text of differences between the comparison file and the reference file, the highlighted difference text in the reference file is the text of the difference between the reference file and the comparison file.
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the display unit 24 includes:
  • a conversion module configured to convert the coordinate information into position information of divided DIV elements
  • a display module configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
  • the display unit 24 also includes:
  • a generation module for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
  • a binding module used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
  • the first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
  • the display unit 24 is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, according to the difference comparison result in a preset
  • the display area displays the difference details.
  • the preset display area is an area other than the reference file display area and the comparison file display area.
  • the difference details include the difference type in each piece of difference information, the difference in the reference file text, the text of the differences in the comparison file.
  • the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
  • the display unit 24 also includes:
  • An acquiring module configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
  • the second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
  • the receiving unit 20 is further configured to, before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result, receive a message for the first scroll bar.
  • a scrolling instruction, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
  • a determining unit configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction
  • a synchronous scrolling unit configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
  • the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
  • the sending unit 22 includes:
  • a recognition module configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
  • OCR optical character recognition
  • the splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text.
  • the target file is a file containing a single page of text file
  • a sending module configured to send the reference text and the comparison text to the server.
  • another embodiment of the present invention also provides a computing device, the computing device includes:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any embodiment of the present invention.
  • the processor is coupled with the storage device.
  • an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present invention is implemented.
  • the embodiment of the present invention also provides a file comparison system based on RPA and AI, and the system includes an RPA robot 30 , a client 32 and a server 34 .
  • the RPA robot 30 may be independent from the client 32
  • the RPA robot 30 may be a part of the client 32 .
  • the RPA robot 30 is configured to log into the client 32, upload the reference file and the comparison file to the client 32, and trigger the client 32 to send the reference file and the comparison file to The server 34 performs a difference comparison;
  • the client 32 is configured to receive the reference file and the comparison file uploaded by the RPA robot, and send the reference file and the comparison file to the server;
  • the server 34 is configured to compare the difference between the reference file and the comparison file according to a preset comparison algorithm, and obtain a difference comparison result of the comparison file relative to the reference file;
  • the client 32 is also configured to receive the difference comparison result sent by the server, and according to the difference comparison result, highlight the difference text in the comparison file and/or the reference file, wherein , the highlighted difference text in the comparison file is the difference text between the comparison file and the reference file, and the highlighted difference text in the reference file is the difference text between the reference file and the Compare text with differences between files.
  • sequence numbers of the above-mentioned processes do not necessarily mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be used in the implementation of the present invention.
  • the implementation of the examples constitutes no limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean determining B only based on A, and B can also be determined based on A and/or other information.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-accessible memory.
  • the technical solution of the present invention or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute some or all of the steps of the above-mentioned methods in various embodiments of the present invention.
  • ROM read-only Memory
  • RAM random access memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the modules in the device in the embodiment may be distributed in the device in the embodiment according to the description in the embodiment, or may be changed and located in one or more devices different from the embodiment.
  • the modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules.

Abstract

A file comparison method and apparatus based on RPA and AI, a device, and a storage medium. The method comprises: receiving a reference file and a comparison file uploaded by an RPA robot (S100); sending the reference file and the comparison file to a server (S110); receiving a difference comparison result of the comparison file sent by the server relative to the reference file (S120); and highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result (S130), wherein the difference text highlighted in the comparison file is the text of the comparison file different from the reference file, and the difference text highlighted in the reference file is the text of the reference file different from the comparison file. Automation of file comparison can be achieved, and the difference between two files can also be highlighted.

Description

基于RPA和AI的文件比对方法、装置、设备及存储介质File comparison method, device, equipment and storage medium based on RPA and AI 技术领域technical field
本发明实施例涉及流程自动化技术领域,具体而言,涉及一种基于RPA和AI的文件比对方法、装置、设备及存储介质。Embodiments of the present invention relate to the technical field of process automation, and in particular, relate to a method, device, device, and storage medium for comparing files based on RPA and AI.
背景技术Background technique
RPA(Robotic Process Automation,机器人流程自动化),是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。RPA (Robotic Process Automation, robotic process automation) is to simulate the operation of human beings on the computer through specific "robot software", and automatically execute process tasks according to the rules.
AI(Artificial Intelligence,人工智能)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学。AI (Artificial Intelligence) is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
RPA具有独特的优势:低代码、非侵入。低代码是说,RPA不需要很高的IT水平就能操作,不懂编程的业务人员也能开发流程;非侵入是说,RPA可以模拟人的操作,不用软件系统开放接口。但是传统的RPA具有一定的局限性:只能基于固定的规则,并且应用场景受限。随着AI技术的不断发展,RPA与AI深度融合克服了传统RPA的局限,RPA+AI=Hand work+Head work,正在极大的改变劳动力的价值。RPA has unique advantages: low-code, non-intrusive. Low-code means that RPA can be operated without a high IT level, and business personnel who do not understand programming can also develop processes; non-intrusive means that RPA can simulate human operations without opening interfaces with software systems. However, traditional RPA has certain limitations: it can only be based on fixed rules, and its application scenarios are limited. With the continuous development of AI technology, the deep integration of RPA and AI overcomes the limitations of traditional RPA. RPA+AI=Hand work+Head work is greatly changing the value of labor.
在日常工作中,常常需要对两个版本的合同、法条等文件进行比对,以确定新产生的文件相对于原始文件发生了哪些变化。然而,目前在进行文件比对时,需要人工获取待比对的两个文件,然后进行人工比对和人工标记差异性。当需要对比的文件较多或者待对比文件页数较多时,就需要工作人员做重复性低价值的文件比对劳动,从而占用大量工作时间,工作效率较低。In daily work, it is often necessary to compare two versions of contracts, laws and other documents to determine what changes have occurred in the newly generated document compared to the original document. However, at present, when performing file comparison, it is necessary to manually obtain the two files to be compared, and then manually compare and manually mark the differences. When there are many documents to be compared or the number of pages of documents to be compared is large, the staff is required to do repetitive and low-value document comparison work, which takes up a lot of working time and lower work efficiency.
发明内容Contents of the invention
本发明实施例提供一种基于RPA和AI的文件比对方法、装置、设备及存储介质,不仅可以实现文件比对的自动化,还可以突出显示两文件间的差异,从而提高了用户查找文件差异的效率。Embodiments of the present invention provide a file comparison method, device, device, and storage medium based on RPA and AI, which can not only realize the automation of file comparison, but also highlight the differences between two files, thereby improving the user's search for file differences. s efficiency.
第一方面,本发明实施提供了一种基于RPA和AI的文件比对方法,所述方法应用于客户端,所述方法包括:In the first aspect, the implementation of the present invention provides a method for comparing files based on RPA and AI, the method is applied to the client, and the method includes:
S1、接收机器人流程自动化RPA机器人上传的参考文件和比对文件;S1. Receive the reference files and comparison files uploaded by the robot process automation RPA robot;
S2、将所述参考文件和所述比对文件发送给服务器;S2. Send the reference file and the comparison file to a server;
S3、接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果;S3. Receive the difference comparison result of the comparison file relative to the reference file sent by the server;
S4、根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。S4. According to the difference comparison result, highlight the difference text in the comparison file and/or the reference file, wherein the difference text highlighted in the comparison file is the comparison file For texts that are different from the reference file, the text that is highlighted in the reference file is the text that is different between the reference file and the comparison file.
可选的,所述差异性比对结果包括至少一条差异信息,每条差异信息包括差异类型、 在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference The difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file, the difference position information includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
可选的,所述S4包括:Optionally, the S4 includes:
S41、将所述坐标信息转换成划分DIV元素位置信息;S41. Convert the coordinate information into position information of divided DIV elements;
S42、当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。S42. When the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the page identification corresponding to the position information of the DIV element, perform an operation on all pages indicated by the page identification Highlight the difference text at the position information of the above DIV element.
可选的,所述S4还包括:Optionally, the S4 also includes:
S43、针对同一条差异信息,根据在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型,生成身份证标识号ID,并分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定;S43. For the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID, and respectively Binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
S44、当接收到基于所述参考文件或者所述比对文件触发的第一同步定位指令时,将与所述第一同步定位指令对应的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。S44. When the first synchronous positioning instruction triggered based on the reference file or the comparison file is received, synchronize the difference text at the position information of all DIV elements bound to the ID corresponding to the first synchronous positioning instruction to highlight.
可选的,在所述S3之后,所述方法还包括:Optionally, after the S3, the method further includes:
S5、根据所述差异性比对结果在预设显示区域显示差异明细,所述预设显示区域为除了参考文件显示区域和比对文件显示区域以外的区域,所述差异明细包括每条差异信息中的差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本。S5. Display the difference details in the preset display area according to the difference comparison result, the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include each piece of difference information The type of difference in , the text of the difference in the reference file, the text of the difference in the comparison file.
可选的,在所述S43之后,所述方法还包括:Optionally, after the S43, the method further includes:
S45、将所述ID与所述差异明细中对应的差异信息进行绑定;S45. Bind the ID with the corresponding difference information in the difference details;
S46、当接收到基于所述差异明细触发的第二同步定位指令时,获取与所述第二同步定位指令对应的所述差异明细中的差异信息绑定的ID;S46. When receiving a second synchronous positioning instruction triggered based on the difference details, acquire an ID bound to the difference information in the difference details corresponding to the second synchronous positioning instruction;
S47、将与获取的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。S47. Synchronously highlight the difference text at the position information of all DIV elements bound to the acquired ID.
可选的,在所述S4之前,所述方法还包括:Optionally, before the S4, the method further includes:
S6、接收针对第一滚动条的滚动指令,所述第一滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条;S6. Receive a scrolling instruction for the first scroll bar, where the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
S7、根据所述滚动指令确定所述第一滚动条当前已滚动的长度占滚动区域总长度的比例;S7. Determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction;
S8、根据所述比例滚动第二滚动条,以使得所述第一滚动条与所述第二滚动条同步滚动,所述第二滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,但与所述第一滚动条不同。S8. Scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes the scroll bar of the reference file display area or the comparison file display area , but not the same as the first scrollbar.
可选的,所述S4包括:Optionally, the S4 includes:
根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示当前滚动 到显示区域的差异文本。According to the difference comparison result, the difference text currently scrolled to the display area is highlighted in the comparison file and/or the reference file.
可选的,所述S2包括:Optionally, the S2 includes:
S21、利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;S21. Using optical character recognition (OCR) to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of text of the comparison document;
S22、当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;S22. When the target file is a file containing multiple pages of text, splicing the multiple pages of text of the target file into one page of text with continuous context to obtain the target text; when the target file is a file containing a single page of text, Obtain a single page of text from the target file as the target text, wherein, when the target file is the reference file, the target text is a reference text, and when the target file is the comparison file, the target text is The target text mentioned above is the comparison text;
S23、将所述参考文本和所述比对文本发送给所述服务器。S23. Send the reference text and the comparison text to the server.
第二方面,本发明实施例提供了一种基于RPA和AI的文件比对装置,所述装置应用于客户端,所述装置包括:In the second aspect, the embodiment of the present invention provides a file comparison device based on RPA and AI, the device is applied to the client, and the device includes:
接收单元,用于接收机器人流程自动化RPA机器人上传的参考文件和比对文件;A receiving unit, configured to receive reference files and comparison files uploaded by robotic process automation RPA robots;
发送单元,用于将所述参考文件和所述比对文件发送给服务器;a sending unit, configured to send the reference file and the comparison file to a server;
所述接收单元,还用于接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果;The receiving unit is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
显示单元,用于根据所述差异性比对结果在所述比对文件和/或所述参考文件中突出显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。A display unit, configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the comparison For texts that are different between the file and the reference file, the highlighted text of the difference in the reference file is the text that is different between the reference file and the comparison file.
可选的,所述差异性比对结果包括至少一条差异信息,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference The difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file, the difference position information includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
可选的,所述显示单元,包括:Optionally, the display unit includes:
转换模块,用于将所述坐标信息转换成划分DIV元素位置信息;A conversion module, configured to convert the coordinate information into position information of divided DIV elements;
显示模块,用于当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。A display module, configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
可选的,所述显示单元还包括:Optionally, the display unit also includes:
生成模块,用于针对同一条差异信息,根据在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型,生成身份证标识号ID;A generation module, for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
绑定模块,用于分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定;A binding module, used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
第一同步模块,用于当接收到基于所述参考文件或者所述比对文件触发的第一同步定 位指令时,将与所述第一同步定位指令对应的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。The first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
可选的,所述显示单元,还用于在接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果之后,根据所述差异性比对结果在预设显示区域显示差异明细,所述预设显示区域为除了参考文件显示区域和比对文件显示区域以外的区域,所述差异明细包括每条差异信息中的差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本。Optionally, the display unit is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, display in preset according to the difference comparison result The area displays the difference details. The preset display area is an area other than the reference file display area and the comparison file display area. The difference details include the difference type in each piece of difference information, and the difference text in the reference file , the difference text in the comparison file.
可选的,所述绑定模块,还用于在分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定之后,将所述ID与所述差异明细中对应的差异信息进行绑定;Optionally, the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
所述显示单元还包括:The display unit also includes:
获取模块,用于当接收到基于所述差异明细触发的第二同步定位指令时,获取与所述第二同步定位指令对应的所述差异明细中的差异信息绑定的ID;An acquiring module, configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
第二同步模块,用于将与获取的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。The second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
可选的,所述接收单元,还用于在根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示差异文本之前,接收针对第一滚动条的滚动指令,所述第一滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条;Optionally, the receiving unit is further configured to receive scrolling for the first scroll bar before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result Instructions, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
确定单元,用于根据所述滚动指令确定所述第一滚动条当前已滚动的长度占滚动区域总长度的比例;A determining unit, configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction;
同步滚动单元,用于根据所述比例滚动第二滚动条,以使得所述第一滚动条与所述第二滚动条同步滚动,所述第二滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,但与所述第一滚动条不同。a synchronous scrolling unit, configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
可选的,所述显示单元,用于根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示当前滚动到显示区域的差异文本。Optionally, the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
可选的,所述发送单元,包括:Optionally, the sending unit includes:
识别模块,用于利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;A recognition module, configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
拼接模块,用于当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;The splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text. When the target file is a file containing a single page of text file, obtain a single-page text from the target file as the target text, wherein, when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file , the target text is the comparison text;
发送模块,用于将所述参考文本和所述比对文本发送给所述服务器。A sending module, configured to send the reference text and the comparison text to the server.
第三方面,本发明实施例提供了一种计算设备,所述计算设备包括:In a third aspect, an embodiment of the present invention provides a computing device, and the computing device includes:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in the first aspect.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect is implemented.
本发明实施例提供的基于RPA和AI的文件比对方法、装置、设备及存储介质,能够由RPA机器人自动将待比对的参考文件和比对文件上传至客户端,由客户端将参考文件和比对文件传输给服务器进行差异性比对,最后可以根据服务器返回的差异性比对结果在比对文件和/或参考文件中突出显示差异文本。由此可知,与现有技术中需要人工比对文件相比,本发明实施例能够利用RPA机器人自动触发客户端发送两个待比对的文件给服务器进行自动比对,从而不仅可以节省人力,让原本需要做文件比对的人员有时间去做更有价值的工作,还可以提高文件比对的效率;与现有技术需要人工标记差异性相比,本发明实施例可以直接在参考文件和/或比对文件中突出显示差异文本,从而可以提高差异文本的可读性,进而可以提高用户查找两文件间差异的效率。其中,在客户端向服务器发送参考文件和比对文件时,可以先利用OCR(Optical Character Recognition,光学字符识别)对参考文件和比对文件进行识别,再将这两个文件中包含多页文本的文件进行文本拼接,获得单页且上下文连续的参考文本以及单页且上下文连续的比对文本,最后将参考文本和比对文本发送给服务器进行差异性比对,从而可以使得服务器直接结合上下文对两个文本进行比对,而无需服务器做其他处理,进而可以提高服务器进行文件比对的效率和准确性。The RPA and AI-based file comparison method, device, equipment, and storage medium provided by the embodiments of the present invention can automatically upload the reference file and the comparison file to the client by the RPA robot, and the reference file can be uploaded by the client The comparison file and the comparison file are transmitted to the server for difference comparison, and finally the difference text can be highlighted in the comparison file and/or reference file according to the difference comparison result returned by the server. It can be seen that, compared with the manual comparison of files in the prior art, the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved. Among them, when the client sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text The documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context. Finally, the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context. The two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
此外,本发明实施例还可以实现的技术效果包括:In addition, the technical effects that can also be achieved by the embodiments of the present invention include:
1、用户可以通过参考文件显示区域、比对文件显示区域或者差异明细显示区域触发同步定位指令,使得客户端针对同一条差异信息进行同步突出显示,从而提高了用户查看差异文本的效率。1. The user can trigger a synchronous positioning command through the reference file display area, the comparison file display area or the difference detail display area, so that the client can simultaneously highlight the same difference information, thereby improving the efficiency of users viewing the difference text.
2、用户可以通过拖动参考文件显示区域或者比对文件显示区域的滚动条,使得客户端针对这两个显示区域进行同步滚动,从而提高了用户查看文本的效率。2. The user can drag the scroll bar of the reference file display area or the comparison file display area to make the client scroll synchronously for these two display areas, thereby improving the efficiency of users viewing text.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明实施例提供的一种基于RPA和AI的文件比对方法的流程图;Fig. 1 is the flowchart of a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention;
图2是本发明实施例提供的一种显示差异性比对结果的示例图;Fig. 2 is an example diagram showing a difference comparison result provided by an embodiment of the present invention;
图3是本发明实施例提供的另一种显示差异性比对结果的示例图;Fig. 3 is another example diagram showing difference comparison results provided by the embodiment of the present invention;
图4是本发明实施例提供的一种基于RPA和AI的文件比对装置的组成框图;FIG. 4 is a block diagram of a file comparison device based on RPA and AI provided by an embodiment of the present invention;
图5是本发明实施例提供的一种基于RPA和AI的文件比对系统架构图;Fig. 5 is a kind of file comparison system architecture diagram based on RPA and AI provided by the embodiment of the present invention;
图6是本发明实施例提供的另一种基于RPA和AI的文件比对系统架构图。Fig. 6 is an architecture diagram of another file comparison system based on RPA and AI provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
需要说明的是,本发明实施例及附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "include" and "have" and any variations thereof in the embodiments of the present invention and drawings are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.
在日常工作中,常常需要人工对不同版本的文件进行差异性比对,该工作不仅重复性强、难度低,还十分消耗时间,进而导致公司对自动化比对文件的需求越来越急迫。而RPA(Robotic Process Automation,机器人流程自动化)技术可以通过用户使用界面,智能理解所在电子设备的已有应用,将重复的、基于规则、大批量的常规操作自动化,如自动重复读取邮件、读取Office组件、操作数据库及网页、客户端软件等,采集数据并进行繁琐的计算,并批量生成所需的文件和报告,从而通过RPA技术能够大幅降低人力成本的投入,有效提高办公效率。AI(Artificial Intelligence,人工智能)技术可以突破固定规则,模拟人的思维、意识来自动化处理一些更复杂的应用场景。基于此,本发明实施例提供了一种结合RPA和AI两种技术来自动化比对文件,从而不仅可以省去人力,提高文件比对的效率,还可以突出显示两文件间的差异,提高了用户查找文件差异的效率。In daily work, it is often necessary to manually compare the differences between different versions of files. This work is not only highly repetitive and difficult, but also very time-consuming, which leads to an increasingly urgent need for companies to automatically compare files. The RPA (Robotic Process Automation) technology can intelligently understand the existing applications of the electronic equipment through the user interface, and automate repetitive, rule-based, and large-scale routine operations, such as automatically and repeatedly reading emails, reading Take Office components, operate databases, web pages, client software, etc., collect data and perform tedious calculations, and generate required files and reports in batches, so that RPA technology can greatly reduce labor costs and effectively improve office efficiency. AI (Artificial Intelligence, artificial intelligence) technology can break through fixed rules, simulate human thinking and consciousness to automatically handle some more complex application scenarios. Based on this, the embodiment of the present invention provides a combination of RPA and AI technologies to automatically compare files, which not only saves manpower, improves the efficiency of file comparison, but also highlights the differences between the two files, improving the Efficiency for users to find differences in files.
下面对本发明实施例进行详细说明。The embodiments of the present invention will be described in detail below.
在本发明实施例的描述中,术语“参考文件”是指在进行差异性比对时,被作为参考依据的文件,“比对文件”是指被比对的两个文件中除了作为参考依据以外的文件,在实际应用中,参考文件的版本往往低于比对文件,参考文件和比对文件可以为任何领域的文件,比如可以为合同文件、金融类文件、程序文件等。In the description of the embodiments of the present invention, the term "reference file" refers to a file that is used as a reference when performing a difference comparison, and "comparison file" refers to a file that is used as a reference in the two files that are compared. In practical applications, the version of the reference document is often lower than that of the comparison document. The reference document and comparison document can be documents in any field, such as contract documents, financial documents, program documents, etc.
在本发明实施例的描述中,术语“多页文件”是指大于或者等于两页文本内容的文件,术语“多页文本”是指大于或者等于两页的文本。In the description of the embodiments of the present invention, the term "multi-page document" refers to a document with text content greater than or equal to two pages, and the term "multi-page text" refers to a document greater than or equal to two pages.
在本发明实施例的描述中,术语“OCR”是指光学字符识别(Optical Character Recognition),具体是指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。在本发明实施例中,可以基于RPA机器人利用OCR技术将纸质文档中的文字转换成为黑白点阵的图像文件,再由客户端利用OCR技术从图像文件中识别出图像文件中包含的文本内容,也可以基于RPA机器人利用OCR技术从纸质文档中获取文本内容,生成包含文本内容的文本文件(即一种可编 辑文件),再由客户端直接从文本文件中提取文本内容。In the description of the embodiments of the present invention, the term "OCR" refers to Optical Character Recognition (Optical Character Recognition), specifically refers to the electronic equipment to check the characters printed on the paper, determine its shape by detecting dark and bright patterns, and then use the character Recognition method The process of translating shapes into computer text; that is, for printed characters, the text in the paper document is converted into an image file of black and white dot matrix by optical means, and the text in the image is converted into text by recognition software Format, technology for further editing and processing by word processing software. In the embodiment of the present invention, the RPA robot can use OCR technology to convert the text in the paper document into a black and white dot matrix image file, and then the client can use OCR technology to identify the text content contained in the image file from the image file , it is also possible to use OCR technology to obtain text content from paper documents based on RPA robots, generate a text file (that is, an editable file) containing text content, and then the client directly extracts text content from the text file.
在本发明实施例的描述中,术语“客户端”是具有文件比对需求的业务系统前端,“服务器”是指具有文件比对需求的业务系统后端。“客户端”可以为业务系统对应的应用软件,也可以为浏览器,以便RPA机器人通过浏览器访问业务系统的网站。术语“RPA机器人”可以集成在客户端中,也可以以插件等形式嵌入客户端中,也可以与客户端相互独立,只要RPA机器人能够自动化访问客户端即可,本发明实施例对RPA机器人的具体形式不做限定。In the description of the embodiment of the present invention, the term "client" refers to the front-end of the business system with file comparison requirements, and "server" refers to the back-end of the business system with file comparison requirements. The "client" can be the application software corresponding to the business system, or it can be a browser, so that the RPA robot can access the website of the business system through the browser. The term "RPA robot" can be integrated in the client, can also be embedded in the client in the form of a plug-in, or can be independent of the client, as long as the RPA robot can automatically access the client. The specific form is not limited.
在本发明实施例的描述中,术语“NLP”是指自然语言处理(Natural Language Processing),其以语言为对象,利用计算机技术来分析、理解和处理自然语言的一门学科,即把计算机作为语言研究的强大工具,在计算机的支持下对语言信息进行定量化的研究,并提供可供人与计算机之间能共同使用的语言描写。In the description of the embodiments of the present invention, the term "NLP" refers to natural language processing (Natural Language Processing), which takes language as an object and uses computer technology to analyze, understand and process natural language, that is, a computer is used as A powerful tool for language research. With the support of computers, it conducts quantitative research on language information and provides language descriptions that can be used jointly by humans and computers.
在本发明实施例的描述中,术语“拼接”是指将待拼接的内容连接在一起,而不改变原始内容。通过将多页文本进行拼接,可以在保留原有文本内容排列顺序的基础上,让多页文本内容无缝衔接。In the description of the embodiments of the present invention, the term "splicing" refers to connecting the content to be spliced together without changing the original content. By splicing multiple pages of text, the text content of multiple pages can be seamlessly connected on the basis of retaining the arrangement order of the original text content.
在本发明实施例的描述中,术语“预设比对算法”是指确定比对文本相对于参考文本所存在差异的具体比对方法,可以按照预设比对单元对参考文本和比对文本分批次进行比对,直至比对完成,具体比对过程可参见S120的详解。其中,术语“预设比对单元”是指每次被比对的文本的大小,具体可以根据实际情况而定,可以为一个词组、一个句子或一个段落等。In the description of the embodiments of the present invention, the term "preset comparison algorithm" refers to a specific comparison method for determining the difference between the comparison text and the reference text, and the reference text and the comparison text can be compared according to the preset comparison unit The comparison is performed in batches until the comparison is completed. For the specific comparison process, please refer to the detailed explanation of S120. Wherein, the term "preset comparison unit" refers to the size of the text to be compared each time, which may be determined according to the actual situation, and may be a phrase, a sentence, or a paragraph.
在本发明实施例的描述中,术语“差异性比对”是指对比参考文本和比对文本之间存在哪些差异。术语“差异性比对结果”是指在将参考文本和比对文本进行差异性比对后获得的包括至少一条差异信息的结果,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。术语“差异类型”用于表征差异的类别,主要包括内容删除、内容增加和内容修改。术语“分页标识”用于指示当前分页位于整个文件的第几页。对于术语“坐标信息”而言,可以针对每个分页建立坐标系,以每个分页的第一个字符位置处作为原点,分别以水平向右和垂直向下为横轴和纵轴,从而可以为分页中每个字符生成对应的坐标。术语“差异文本”是指当前文件中与另一个文件所不同的文本内容。In the description of the embodiments of the present invention, the term "difference comparison" refers to comparing the differences between the reference text and the comparison text. The term "difference comparison result" refers to a result including at least one piece of difference information obtained after making a difference comparison between the reference text and the comparison text, and each piece of difference information includes the type of difference, the difference in the reference file Text, difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, the difference position information includes difference The page identification of the page to which the text belongs, and the coordinate information of the page to which the difference text belongs. The term "difference type" is used to characterize the category of differences, mainly including content deletion, content addition and content modification. The term "page identification" is used to indicate which page the current page is located in the entire file. For the term "coordinate information", a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal axis and vertical axis being horizontally right and vertically downward respectively, so that Generate corresponding coordinates for each character in the page. The term "difference text" refers to the text content in the current file that is different from another file.
在本发明实施例的描述中,术语“突出显示”是一种能够将差异文本与其他文本明显区分开的显示方式,突出显示的方式包括不限于以下一种或多种的组合:加粗字体、更换字体颜色、增加字体底色、加亮字体、增大字体、更换为斜体、增加下划线、增加删除线等。In the description of the embodiments of the present invention, the term "highlighting" is a display method that can clearly distinguish the difference text from other texts. The highlighting method includes but is not limited to a combination of one or more of the following: bold font , Change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc.
在本发明实施的描述中,术语“鉴权”是指验证发送参考文件和比对文件的客户端 是否具有进行文件比对的权限,具体可以通过验证客户端的用户信息是否满足该权限要求来实现鉴权。In the description of the implementation of the present invention, the term "authentication" refers to verifying whether the client that sends the reference file and the comparison file has the authority to perform file comparison, specifically, it can be realized by verifying whether the user information of the client meets the authority requirement authentication.
在本发明实施的描述中,术语“DIV元素位置信息”是指DIV(DIVision,划分)元素在网络界面的位置信息,术语“DIV元素”用来为HTML(Hyper Text Markup Language,超文本标记语言)文档内块级(block-level)内容提供结构和背景的元素。In the description that the present invention implements, the term "DIV element position information" refers to the position information of the DIV (DIVision, division) element in the network interface, and the term "DIV element" is used for HTML (Hyper Text Markup Language, hypertext markup language) ) An element that provides structure and context to block-level content within a document.
在本发明实施的描述中,术语“绑定”是指建立待绑定的至少两个参数的映射关系,使得通过其中一个参数,可以查找到另一个参数。In the description of the implementation of the present invention, the term "binding" refers to establishing a mapping relationship of at least two parameters to be bound, so that one parameter can be used to find another parameter.
在本发明实施的描述中,术语“差异明细”为针对每一处差异的具体描述信息,差异明细包括每条差异信息中的差异类型、在参考文件中的差异文本、在比对文件中的差异文本。In the description of the implementation of the present invention, the term "difference details" is specific descriptive information for each difference, and the difference details include the difference type in each piece of difference information, the difference text in the reference file, and the difference text in the comparison file. diff text.
在本发明实施的描述中,术语“同步定位指令”是一种指示将同一条差异信息涉及的差异文本进行同步显示的指令。In the description of the implementation of the present invention, the term "synchronous positioning instruction" is an instruction indicating to display the difference text involved in the same piece of difference information synchronously.
在本发明实施的描述中,术语“同步滚动”是指将至少两个滚动条的滚动进度保持一致的一种滚动方式。In the description of the implementation of the present invention, the term "synchronous scrolling" refers to a scrolling manner that keeps the scrolling progress of at least two scroll bars consistent.
图1是本发明实施例提供的一种基于RPA和AI的文件比对方法,该方法主要应用于客户端,具体包括:Fig. 1 is a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention, this method is mainly applied to the client, specifically includes:
S100、接收RPA机器人上传的参考文件和比对文件。S100. Receive the reference file and comparison file uploaded by the RPA robot.
具体的,本发明实施例可以在能够登录客户端的电子设备中配置RPA程序(可以集成或嵌入客户端,也可以独立于客户端),以使电子设备可以按照RPA程序中设定的规则,模拟用户的鼠标键盘操作自动登录客户端,并通过访问客户端触发客户端生成包括参考文件和比对文件的文件比对请求,向服务器发送该文件比对请求,以便服务器对参考文件和比对文件进行差异性比对。其中,在登录客户端时,客户端可以弹出包含验证码图像的登录界面,这种情况下RPA机器人可以对验证码图像进行OCR识别,获得验证码图像中的验证码内容,并将验证码内容输入到对应的编辑框,从而成功登录客户端。Specifically, the embodiment of the present invention can configure the RPA program in the electronic device that can log in to the client (it can be integrated or embedded in the client, or it can be independent of the client), so that the electronic device can simulate the RPA program according to the rules set in the RPA program The user's mouse and keyboard operations automatically log in to the client, and trigger the client to generate a file comparison request including reference files and comparison files by accessing the client, and send the file comparison request to the server so that the server can compare the reference files and comparison files Compare differences. Among them, when logging in to the client, the client can pop up a login interface containing a verification code image. In this case, the RPA robot can perform OCR recognition on the verification code image, obtain the verification code content in the verification code image, and store the verification code content Enter it into the corresponding edit box to successfully log in to the client.
参考文件和比对文件可以存储在客户端中,也可以存储在电子设备的其他存储空间,也可以是纸质文件。当存储在电子设备的其他存储空间时,RPA机器人可以从所述其他存储空间查找参考文件和比对文件,并将参考文件和比对文件上传到客户端,例如通过点击上传按钮的方式将这两个文件上传到客户端,也可以通过拖拽的方式将这两个文件拖到指定区域以实现文件上传,也可以为其他上传方式。当参考文件和/或比对文件为纸质文件时,RPA机器人可以利用OCR技术先将纸质文件转换为图像文件或者转换为文本文件(即纸质文件中的文本内容构成的可编辑文件),然后再利用上述方法上传到客户端。The reference file and comparison file can be stored in the client, or in other storage space of the electronic device, or can be a paper file. When stored in other storage spaces of the electronic device, the RPA robot can search for reference files and comparison files from the other storage spaces, and upload the reference files and comparison files to the client, for example, by clicking the upload button. The two files are uploaded to the client, and the two files can also be dragged to the designated area by dragging and dropping to realize the file upload, or other upload methods can be used. When the reference file and/or comparison file is a paper file, the RPA robot can use OCR technology to convert the paper file into an image file or a text file (that is, an editable file composed of the text content in the paper file) , and then use the above method to upload to the client.
当客户端接收到RPA上传的参考文件和比对文件后,可以对参考文件和比对文件进行渲染,以向用户展示上传的文件。具体的,当参考文件和/或比对文件为word文件时,可以先将word文件转换成PDF文件,然后利用客户端自带的渲染库进行渲染,且当PDF文件为多页文件时,进行多页渲染;当参考文件和/或比对文件为除tiff格式外的图片文件 时,可以利用客户端自带的渲染库进行渲染;当参考文件和/或比对文件为tiff格式的图片文件时,可以利用tiff格式的专有渲染库进行渲染。其中,将word文件转换成PDF文件时,可以由客户端发送给服务器执行转换操作,再由服务器反馈给客户端PDF文件进行渲染。After the client receives the reference file and comparison file uploaded by RPA, it can render the reference file and comparison file to show the uploaded file to the user. Specifically, when the reference file and/or comparison file is a word file, you can first convert the word file into a PDF file, and then use the rendering library that comes with the client to render, and when the PDF file is a multi-page file, perform Multi-page rendering; when the reference file and/or comparison file is an image file other than tiff format, the rendering library that comes with the client can be used for rendering; when the reference file and/or comparison file is an image file in tiff format , you can use the proprietary rendering library in tiff format for rendering. Wherein, when converting a word file into a PDF file, the client may send the conversion operation to the server, and then the server may feed back the PDF file to the client for rendering.
S110、将所述参考文件和所述比对文件发送给服务器。S110. Send the reference file and the comparison file to a server.
客户端接收到RPA机器人上传的参考文件和比对文件后,可以接收RPA机器人触发的文件比对指令,然后直接根据文件比对指令生成包括参考文件和比对文件的文件比对请求,向服务器发送该文件比对请求,以便服务器对参考文件和比对文件进行差异性比对。然而,服务器接收到参考文件和比对文件之后,往往需要先识别这两个文件中的文本,才能进行差异性比对,若向服务器发送文件比对请求的客户端比较多,会导致服务器进行文件比对的效率降低。为了减轻服务器的负担,从而提高文件比对效率,本发明实施例可以由客户端先利用OCR对参考文件和比对文件进行识别,获得参考文件的至少一页文本以及比对文件的至少一页文本,再将识别出的文本发送给服务器进行差异性比对。After receiving the reference file and comparison file uploaded by the RPA robot, the client can receive the file comparison command triggered by the RPA robot, and then directly generate a file comparison request including the reference file and the comparison file according to the file comparison command, and send it to the server Send the file comparison request so that the server can compare the differences between the reference file and the comparison file. However, after the server receives the reference file and the comparison file, it often needs to identify the text in the two files before performing a difference comparison. If there are many clients sending file comparison requests to the server, it will cause the server to perform The efficiency of file comparison is reduced. In order to reduce the burden on the server and thereby improve the efficiency of document comparison, in the embodiment of the present invention, the client can first use OCR to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of the comparison document. text, and then send the recognized text to the server for difference comparison.
在实际应用中,若直接将参考文件的至少一页文本与比对文件的至少一页文本进行单页比对,即将参考文件的第N页与比对文件的第N页进行比对,而不关注各页之间的关联关系,则很容易发生比对结果不准确的情况。例如,参考文件包括两页文本,比对文件在参考文件的第一页文本和第二页文本之间添加了一页文本,从而构成三页文本,若采用单页比对的方法对这两个文件进行比对,则比对结果为参考文件第二页文本与比对文件的第二页文本内容不同,参考文件没有第三页文本,使得比对结果为比对文件的第三页文本在参考文件中不存在,也就是说,采用单页比对的方法,会导致整体比对结果为两个文件除了第一页文本相同外,其他均不相同。In practical applications, if at least one page of text in the reference document is directly compared with at least one page of text in the comparison document, that is, the Nth page of the reference document is compared with the Nth page of the comparison document, and If you do not pay attention to the relationship between pages, it is easy to cause inaccurate comparison results. For example, the reference document includes two pages of text, and the comparison document adds a page of text between the first page of text and the second page of text in the reference document, thereby forming three pages of text. If the single-page comparison method is used to compare the two The comparison result is that the text on the second page of the reference file is different from the text on the second page of the comparison file, and the reference file does not have the text on the third page, so the comparison result is the text on the third page of the comparison file. It does not exist in the reference documents, that is to say, the single-page comparison method will lead to the overall comparison result that the two documents are not the same except for the same text on the first page.
为了避免发生比对结果不准确的问题,本发明实施例在客户端利用OCR对参考文件和比对文件进行识别,获得参考文件的至少一页文本以及比对文件的至少一页文本之后,先进行文本拼接,再将拼接后的文本发送给服务器。具体的,当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,所述目标文件包括参考文件或比对文件,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;将所述参考文本和所述比对文本发送给所述服务器。In order to avoid the problem of inaccurate comparison results, in the embodiment of the present invention, after the client uses OCR to identify the reference file and the comparison file, and obtains at least one page of text in the reference file and at least one page of text in the comparison file, first Perform text splicing, and then send the spliced text to the server. Specifically, when the target file is a file containing multiple pages of text, the multiple pages of text of the target file are spliced into one page of text with continuous context to obtain the target text; when the target file is a file containing a single page of text , obtaining a single page of text from the target file as the target text, wherein the target file includes a reference file or a comparison file, when the target file is the reference file, the target text is a reference text, when When the target file is the comparison file, the target text is the comparison text; sending the reference text and the comparison text to the server.
其中,上下文连续是指保持原有文字的先后顺序。将参考文件或者比对文件的多页文本拼接为上下文连续的一页文本的具体方法可以为按照参考文件或者比对文件的分页顺序,将多页文本依次进行拼接,从而获得上下文连续的一页文本。Among them, the continuous context refers to maintaining the sequence of the original text. The specific method of splicing multiple pages of text in a reference file or a comparison file into one page of text with continuous context can be to splice multiple pages of text in sequence according to the page order of the reference file or comparison file, so as to obtain a page with continuous context text.
需要补充的是,为了提高客户端与服务器之间的通信安全,服务器可以对客户端的用户信息进行鉴权,以验证用户是否具有文件比对权限。具体的,客户端在向服务器发送参考文件和比对文件时,还可以携带客户端的用户信息,以便服务器先根据用户信息对客户 端进行鉴权,当确定鉴权通过时,再对参考文件和比对文件进行差异性比对。其中,用户信息可以为客户端账号,可以为与该客户端账号绑定的手机号,还可以为用户等级或者其他信息,本发明实施对用户信息的具体内容不做限定,可以根据具体情况而定。对用户信息鉴权的方法可以有多种,包含但不限于以下两种:(1)将该用户信息与具有权限的用户列表进行匹配,若匹配成功,则确定该用户信息对应的用户有权限,即鉴权通过,若匹配失败,则确定该用户信息对应的用户没有权限,即鉴权失败;(2)判断该用户信息中的用户等级是否超过预设等级,若超过预设等级,则鉴权通过,若未超过预设等级,则鉴权失败。What needs to be added is that, in order to improve the communication security between the client and the server, the server can authenticate the user information of the client to verify whether the user has the file comparison authority. Specifically, when the client sends the reference file and the comparison file to the server, it can also carry the user information of the client, so that the server can first authenticate the client according to the user information, and then verify the reference file and the comparison file when the authentication is passed. Compare files for difference comparison. Wherein, the user information may be a client account, may be a mobile phone number bound to the client account, may also be a user level or other information, and the implementation of the present invention does not limit the specific content of the user information, which may be determined according to specific circumstances. Certainly. There are many ways to authenticate user information, including but not limited to the following two: (1) Match the user information with the list of authorized users, and if the matching is successful, determine that the user corresponding to the user information has authorization , that is, the authentication passes. If the matching fails, it is determined that the user corresponding to the user information has no authority, that is, the authentication fails; (2) judge whether the user level in the user information exceeds the preset level, and if it exceeds the preset level, then The authentication is passed. If the level does not exceed the preset level, the authentication fails.
S120、接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果。S120. Receive a difference comparison result of the comparison file relative to the reference file sent by the server.
服务器接收到参考文件和比对文件后,可以根据预设比对算法,对参考文件和比对文件进行差异性比对。具体的,可以按照预设比对单元对所述参考文本和所述比对文本进行比对,获得针对每个预设比对单元的比对子结果。在按照预设比对单元对参考文本和比对文本进行比对的过程中,若确定正在比对的参考子文本(预设比对单元的参考文本)与比对子文本(预设比对单元的比对文本)内容相同,则将对应的比对子结果确定为内容相同;若确定正在比对的参考子文本在比对文本中不存在,则将对应的比对子结果确定为内容删除;若确定正在比对的比对子文本在参考文本中不存在,则将对应的比对子结果确定为内容增加。在实际应用中,两个文本之间的差异除了包括内容相同、内容删除和内容增加外,还应该包括内容修改。因此,为了让用户能够更直观地看出比对文本相对于参考文本的区别,可以针对不相邻的第一比对子结果和第二比对子结果,若第一比对子结果和第二比对子结果均为内容相同,且所述第一比对子结果和所述第二比对子结果之间的比对子结果包括内容删除和内容增加,而不包括内容相同,则将所述第一比对子结果和所述第二比对子结果之间的比对子结果合并为一个比对子结果,且合并后的比对子结果为内容修改。其中,预设比对单元的大小可以根据实际情况而定,可以为一个词组、一个句子、一个段落等。After receiving the reference file and the comparison file, the server can compare the difference between the reference file and the comparison file according to a preset comparison algorithm. Specifically, the reference text and the comparison text may be compared according to a preset comparison unit, and comparison sub-results for each preset comparison unit are obtained. In the process of comparing the reference text and the comparison text according to the preset comparison unit, if it is determined that the reference subtext being compared (the reference text of the preset comparison unit) and the comparison subtext (the default comparison If it is determined that the reference sub-text being compared does not exist in the comparison text, then the corresponding comparison sub-result is determined as the content Delete; if it is determined that the comparison sub-text being compared does not exist in the reference text, then determine the corresponding comparison sub-result as content addition. In practical applications, the differences between two texts should include not only the same content, content deletion and content addition, but also content modification. Therefore, in order to allow users to see the difference between the comparison text and the reference text more intuitively, for the non-adjacent first comparison sub-result and the second comparison sub-result, if the first comparison sub-result and the second comparison sub-result Both of the two comparison sub-results have the same content, and the comparison sub-results between the first comparison sub-result and the second comparison sub-result include content deletion and content addition, but do not include the same content, then the The comparison sub-results between the first comparison sub-result and the second comparison sub-result are merged into one comparison sub-result, and the combined comparison sub-result is content modification. Wherein, the size of the preset comparison unit can be determined according to the actual situation, and can be a phrase, a sentence, a paragraph, and the like.
需要补充的是,在对两个文本进行比对时,除了简单地判断文本内容本身使用的字符或文字是否相同外,还可以结合NLP技术对参考子文本与比对子文本进行语义分析,当所述参考子文本与所述比对子文本的含义相同但使用的字符或文字不同时,可以确定对应的比对子结果为内容相同。另外,本发明实施例还可以支持自定义过滤规则,忽略无意义的差异,即当参考子文本和比对子文本之间的差异中存在满足预设过滤规则的差异时,忽略所述满足预设过滤规则的差异。例如,可以设定一个句子有无助词“的”不影响比对结果。服务器在向客户端发送差异性比对结果时,也可以发送忽略的差异,以便客户端向用户展示忽略的差异。What needs to be added is that when comparing two texts, in addition to simply judging whether the characters or words used in the text content itself are the same, NLP technology can also be used to perform semantic analysis on the reference subtext and the comparison subtext. When the reference subtext and the comparison subtext have the same meaning but use different characters or words, it can be determined that the corresponding comparison subtext has the same content. In addition, the embodiment of the present invention can also support self-defined filtering rules, ignoring meaningless differences, that is, when there is a difference that satisfies the preset filtering rules in the difference between the reference subtext and the comparison subtext, ignore the difference that satisfies the preset filtering rules. Set the difference in filtering rules. For example, it can be set that the presence or absence of the particle "of" in a sentence does not affect the comparison result. When the server sends the difference comparison result to the client, it can also send the ignored difference, so that the client can display the ignored difference to the user.
当比对子结果为内容增加、内容删除或者内容修改时,可以针对该比对子结果生成一条差异性信息,以便在获得所有差异信息后,将所有差异信息反馈给客户端。其中,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异 位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。一个比对子结果对应一条差异信息,差异类型包括内容增加、内容删除、内容修改。分页标识用于指示当前分页位于整个文件的第几页。对于坐标信息而言,可以针对每个分页建立坐标系,以每个分页的第一个字符位置处作为原点,分别以水平向右和垂直向下为横轴和纵轴,从而可以为分页中每个字符生成对应的坐标。When the comparison sub-result is content addition, content deletion or content modification, a piece of difference information can be generated for the comparison sub-result, so that all difference information can be fed back to the client after obtaining all the difference information. Wherein, each piece of difference information includes the difference type, the difference text in the reference file, the difference text in the comparison file, the difference position information of the difference text in the reference file, and the difference text in the comparison file. For the difference position information of the difference text in the file, the difference position information includes the page identification of the page to which the difference text belongs and the coordinate information of the difference text on the page to which it belongs. A comparison sub-result corresponds to a piece of difference information, and the difference types include content addition, content deletion, and content modification. The pagination mark is used to indicate which page the current paging is located in the entire file. For the coordinate information, a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal and vertical axes to the right and vertical respectively, so that the Each character generates corresponding coordinates.
在一种实施方式中,当客户端将参考文件和比对文件发送给服务器后,服务器可以对参考文件和比对文件进行比对的任务添加到任务队列中,并将比对任务和该比对任务的任务状态存储到任务数据库中,并在该比对任务的任务状态发生改变时,及时更新任务数据库中的任务状态。客户端可以接收RPA机器人触发的比对任务状态查询指令,并将比对任务状态查询指令发送给所述服务器,以使得所述服务器从任务数据库中查询比对任务状态查询指令对应的比对任务的任务状态,并将查询到的任务状态反馈给所述客户端。其中,当比对任务未被执行时,任务状态可以是未处理,当比对任务正在被执行时,任务状态可以是处理中,当比对任务执行完成时,任务状态可以为已完成。In one embodiment, after the client sends the reference file and the comparison file to the server, the server can add the task of comparing the reference file and the comparison file to the task queue, and compare the task with the comparison task. The task state of the paired task is stored in the task database, and when the task state of the comparison task changes, the task state in the task database is updated in time. The client can receive the comparison task status query command triggered by the RPA robot, and send the comparison task status query command to the server, so that the server can query the comparison task corresponding to the comparison task status query command from the task database , and feed back the queried task status to the client. Wherein, when the comparison task is not being executed, the task status may be unprocessed, when the comparison task is being executed, the task status may be processing, and when the comparison task is completed, the task status may be completed.
此外,服务器可以主动向客户端反馈差异性比对结果,也可以被动向客户端反馈差异性比对结果。其中,被动向客户端反馈差异性比对结果的具体实现方式可以为:客户端接收RPA机器人触发的比对结果查询指令,并将比对结果查询指令发送给服务器,服务器根据比对结果查询指令向客户端发送对应的差异性比对结果。In addition, the server can actively feed back the difference comparison result to the client, and can also passively feed back the difference comparison result to the client. Among them, the specific implementation method of passively feeding back the difference comparison results to the client can be as follows: the client receives the comparison result query command triggered by the RPA robot, and sends the comparison result query command to the server, and the server queries the command according to the comparison result Send the corresponding difference comparison results to the client.
其中,RPA机器人触发客户端发送比对结果查询指令或者比对任务状态查询指令的具体实现方式包括但不限于RPA机器人通过点击客户端上的比对结果查询按钮或比对任务状态查询按钮的方式触发客户端生成并发送对应的指令。Among them, the specific implementation of the RPA robot triggering the client to send a comparison result query command or a comparison task status query command includes, but is not limited to, the RPA robot clicks the comparison result query button or the comparison task status query button on the client. Trigger the client to generate and send corresponding instructions.
S130、根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示差异文本。S130. According to the difference comparison result, highlight the difference text in the comparison file and/or the reference file.
其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。突出显示的方式包括不限于以下一种或多种的组合:加粗字体、更换字体颜色、增加字体底色、加亮字体、增大字体、更换为斜体、增加下划线、增加删除线等。当差异性比对结果中包括多种差异类型时,不同差异类型突出显示的方式可以相同,也可以不同。Wherein, the highlighted difference text in the comparison file is the text of the difference between the comparison file and the reference file, and the highlighted difference text in the reference file is the text of the difference between the reference file and the reference file. The text that describes the differences in the comparison files. The way of highlighting includes but is not limited to a combination of one or more of the following: bold font, change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc. When multiple difference types are included in the difference comparison result, different difference types may be highlighted in the same or different ways.
本步骤的具体实现方式可以为:将所述坐标信息转换成DIV元素位置信息;当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。其中,DIV是层叠样式表中的定位技术,DIV元素是用来为HTML文档内块级(block-level)内容提供结构和背景的元素。为参考文件和比对文件分别封装一个显示区域,且一个显示区域为一个组件,例如可以在界面上封装两个从左到右排列的显示区域分别显示参考文件和比对文件,且当参考文件和/或比对文 件中文本较多,而无法一次性显示全面时,可以增加滚动条的滚动显示功能。The specific implementation of this step may be: converting the coordinate information into DIV element position information; when the DIV element position information enters the display area of the file to which it belongs, according to the difference type corresponding to the DIV element position information and the The page identification corresponding to the position information of the DIV element highlights the difference text at the position information of the DIV element in the page indicated by the page identification. Among them, DIV is a positioning technology in cascading style sheets, and the DIV element is an element used to provide structure and background for block-level content in an HTML document. Encapsulate a display area for the reference file and the comparison file respectively, and one display area is a component, for example, two display areas arranged from left to right can be encapsulated on the interface to display the reference file and the comparison file respectively, and when the reference file And/or when there are many texts in the comparison file and cannot be displayed in one go, the scrolling display function of the scroll bar can be added.
需要说明的是,当根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示差异文本时,若当前差异信息中包含的差异类型为内容删除,则可以仅在参考文件中突出显示被删除的文本,也可以分别突出显示内容删除前的文本和内容删除后保留的文本,即将该差异信息中包含的在参考文件中的差异文本和在比对文件中的差异文件均进行突出显示。若当前差异信息中包含的差异类型为内容增加,则可以仅在比对文件中突出显示增加的内容,也可以分别突出显示内容增加前的文本和内容增加后的文本,即将该差异信息中包含的在参考文件中的差异文本和在比对文件中的差异文件均进行突出显示。若当前差异信息中包含的差异类型为内容修改,则可以突出显示内容修改前的文本和内容修改后的文本,即将该差异信息中包含的在参考文件中的差异文本和在比对文件中的差异文件均进行突出显示。It should be noted that when the difference text is highlighted in the comparison file and/or the reference file according to the difference comparison result, if the difference type contained in the current difference information is content deletion, then it can be Only the deleted text is highlighted in the reference file, or the text before the content deletion and the text retained after the content deletion can be highlighted separately, that is, the difference text contained in the difference information in the reference file and in the comparison file The difference files are highlighted. If the difference type contained in the current difference information is content increase, you can only highlight the added content in the comparison file, or you can highlight the text before the content increase and the text after the content increase respectively, that is, the difference information includes Both the difference text in the reference file and the difference files in the comparison file are highlighted. If the difference type contained in the current difference information is content modification, the text before content modification and the text after content modification can be highlighted, that is, the difference text contained in the difference information in the reference file and the text in the comparison file Difference files are highlighted.
示例性的,图2是参考文件和比对文件的部分文本内容,可以直接在参考文件和比对文件中突出显示差异文本,用户可以通过拖动参考文件和比对文件的滚动条进行浏览。其中,加粗加下划线的文本是指发生修改的文本,加斜加大的文本是指在比对文件中增加的文本,加删除线的文本是指在比对文件中删除的文本。Exemplarily, FIG. 2 is part of the text content of the reference file and the comparison file, and the difference text can be directly highlighted in the reference file and the comparison file, and the user can browse by dragging the scroll bar of the reference file and the comparison file. Among them, the bold and underlined text refers to the modified text, the italic and enlarged text refers to the text added in the comparison file, and the strikethrough text refers to the deleted text in the comparison file.
在一种实施方式中,在针对参考文件和比对文件分别封装独立的显示区域的情况下,当用户需要针对某一处差异进行比对查看时,需要分别拖动两个文件的滚动条进行查看,操作比较繁琐。为了提高用户查看差异的效率,本发明实施例可以针对同一条差异信息,根据在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型,生成ID(Identity Document,身份证标识号),并分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定,当接收到基于所述参考文件或者所述比对文件触发的第一同步定位指令时,将与所述第一同步定位指令对应的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。In one embodiment, in the case of encapsulating independent display areas for the reference file and the comparison file, when the user needs to compare and view a certain difference, he needs to drag the scroll bars of the two files to View, the operation is more cumbersome. In order to improve the efficiency of users viewing differences, the embodiment of the present invention can focus on the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file, and the difference Type, generate ID (Identity Document, ID card identification number), and carry out described ID and the DIV element position information of difference text in described reference file, the DIV element position information of difference text in described comparison file respectively Binding, when receiving the first synchronous positioning instruction triggered based on the reference file or the comparison file, bind the ID corresponding to the first synchronous positioning instruction to the difference text at the position information of all DIV elements Highlighting is done synchronously.
其中,生成ID的具体实现方式包括但不限于:按照预设顺序,将在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型进行拼接,获得一个字符串。其中,不同差异类型可以使用不同的字符表示,例如“内容删除”、“内容增加”和“内容修改”可以依次使用“1”、“2”和“3”进行表示。第一同步定位指令是当用户点击参考文件或者比对文件的显示区域时生成的指令。当客户端接收到第一同步定位指令时,会激活对应的ID。对于参考文件或者比对文件对应的组件而言,会判断激活的ID与自身包含的ID是否相同,若自身包含相同的ID,则可以将该ID对应的DIV元素位置信息处的差异文本进行突出显示,并且当DIV元素位置信息不在显示区域时,则会将DIV元素位置信息滚动到显示区域进行显示。例如,针对同一条差异信息,当参考文件中的差异文本“电子设备”位于参考文件的第2页,比对文件中的差异文本“计算终端”位于比对文件的第3页时,用户在参考文件的第2页的文本“电 子设备”处进行点击,客户端会自动进行同步,使得比对文件自动滚动到第3页,并突出显示文本“计算终端”。Wherein, the specific implementation of generating the ID includes but is not limited to: according to the preset order, the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type Perform concatenation to obtain a string. Wherein, different difference types may be represented by different characters, for example, "content deletion", "content addition" and "content modification" may be represented by "1", "2" and "3" in sequence. The first synchronous positioning instruction is an instruction generated when the user clicks on the display area of the reference file or the comparison file. When the client receives the first synchronous positioning instruction, it will activate the corresponding ID. For the component corresponding to the reference file or the comparison file, it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted display, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display. For example, for the same piece of difference information, when the difference text "electronic equipment" in the reference file is on page 2 of the reference file, and the difference text "computing terminal" in the comparison file is on page 3 of the comparison file, the user can Click on the text "electronic equipment" on page 2 of the reference document, and the client will automatically synchronize, so that the comparison document will automatically scroll to page 3, and the text "computing terminal" will be highlighted.
在一种实施方式中,为了给用户提供更多查看差异的途径,方便用户根据个人习惯进行查看,本发明实施例还可以根据所述差异性比对结果在预设显示区域显示差异明细,所述预设显示区域为除了参考文件显示区域和比对文件显示区域以外的区域,所述差异明细包括每条差异信息中的差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本。此外,还可以对差异性比对结果进行汇总,将汇总结果显示在除了参考文件显示区域、比对文件显示区域、预设显示区域以外的显示区域。汇总结果包括差异信息总条数、差异信息所处的分页标识。例如,汇总结果为“比对后发现,在参考文件的第1、3、5、8页存在差异,两份文件共存在20处差异”。In one implementation, in order to provide users with more ways to view the differences and facilitate users to view them according to their personal habits, the embodiment of the present invention can also display the difference details in the preset display area according to the difference comparison results, so The preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include the difference type in each difference information, the difference text in the reference file, and the difference text in the comparison file. The diff text in the file. In addition, the difference comparison results can also be summarized, and the summary results can be displayed in display areas other than the reference file display area, the comparison file display area, and the preset display area. The summary result includes the total number of difference information and the page identification where the difference information is located. For example, the summary result is "After comparison, it is found that there are differences on pages 1, 3, 5, and 8 of the reference document, and there are 20 differences in total between the two documents".
示例性的,如图3所示,客户端在显示差异时,不仅会在参考文件和/或比对文件中突出显示差异文本,还会在右侧显示比对结果。比对结果中上半部分是整体比对结果(即所述汇总结果),下半部分是详细对比结果(即差异明细)。用户可通过拖动详细比对结果显示区域的滚动条浏览差异明细。Exemplarily, as shown in FIG. 3 , when displaying the difference, the client not only highlights the difference text in the reference file and/or the comparison file, but also displays the comparison result on the right. The upper part of the comparison result is the overall comparison result (ie, the summary result), and the lower part is the detailed comparison result (ie, the difference details). Users can browse the difference details by dragging the scroll bar in the detailed comparison result display area.
由于预设显示区域与参考文件显示区域、比对文件显示区域分别独立,所以用户查看预设显示区域时,参考文件显示区域、比对文件显示区域显示的内容不会改变。在这种情况下,若用户想要结合差异明细在参考文件和比对文件中查看具体内容,则需要用户分别拖动参考文件显示区域和比对文件显示区域的滚动条来实现,操作比较繁琐。为了提高用户基于差异明细查看差异的效率,本发明实施例可以将所述ID与所述差异明细中对应的差异信息进行绑定;当接收到基于所述差异明细触发的第二同步定位指令时,获取与所述第二同步定位指令对应的所述差异明细中的差异信息绑定的ID;将与获取的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。Since the preset display area is independent from the reference file display area and the comparison file display area, when the user views the preset display area, the contents displayed in the reference file display area and the comparison file display area will not change. In this case, if the user wants to view the specific content in the reference file and the comparison file in combination with the difference details, the user needs to drag the scroll bars of the reference file display area and the comparison file display area respectively, and the operation is cumbersome . In order to improve the efficiency for users to view differences based on the difference details, the embodiment of the present invention can bind the ID with the corresponding difference information in the difference details; when receiving the second synchronous positioning instruction triggered based on the difference details Obtaining the IDs bound to the difference information in the difference details corresponding to the second synchronous positioning instruction; synchronously highlighting the difference texts at all DIV element position information bound to the obtained IDs.
其中,第二同步定位指令是当用户点击预设显示区域时生成的指令。当客户端接收到第二同步定位指令时,会激活第二同步定位指令对应的差异明细中的差异信息绑定的ID。对于参考文件或者比对文件对应的组件而言,会判断激活的ID与自身包含的ID是否相同,若自身包含相同的ID,则可以将该ID对应的DIV元素位置信息处的差异文本进行突出显示,并且当DIV元素位置信息不在显示区域时,则会将DIV元素位置信息滚动到显示区域进行显示。Wherein, the second synchronous positioning instruction is an instruction generated when the user clicks on a preset display area. When the client receives the second synchronous positioning command, it will activate the ID bound to the difference information in the difference details corresponding to the second synchronous positioning command. For the component corresponding to the reference file or comparison file, it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted is displayed, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display.
在一种实施方式中,无论是在对两个文件进行比对前,还是比对后,当用户查看两个文件时,需要分别拖动两个文件显示区域的滚动条才能实现两者同步查看,操作比较繁琐。为了提高用户查阅两个文件的效率,本发明实施例可以接收针对第一滚动条的滚动指令;根据所述滚动指令确定所述第一滚动条当前已滚动的长度占滚动区域总长度的比例;根据所述比例滚动第二滚动条,以使得所述第一滚动条与所述第二滚动条同步滚动。也就是说,对于第一滚动条而言,只会跟随用户的拖动而滚动,而不会进行同步滚动,对于第二滚动条而言,会随着第一滚动条的滚动而滚动。In one embodiment, no matter before or after the comparison of the two files, when the user views the two files, the scroll bar in the display area of the two files needs to be dragged respectively to realize synchronous viewing of the two files , the operation is more cumbersome. In order to improve the user's efficiency in viewing two files, the embodiment of the present invention may receive a scrolling command for the first scroll bar; determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scrolling area according to the scrolling command; Scrolling the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously. That is to say, for the first scroll bar, it will only scroll along with the user's dragging, but no synchronous scrolling will be performed; for the second scroll bar, it will scroll along with the scrolling of the first scroll bar.
所述第一滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,所述第二滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,但与所述第一滚动条不同。也就是说,当第一滚动条是参考文件显示区域的滚动条,第二滚动条是比对文件显示区域的滚动条;当第一滚动条是比对文件显示区域的滚动条,第二滚动条是参考文件显示区域的滚动条。The first scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area, and the second scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area, but is different from the The first scroll bar is different. That is to say, when the first scroll bar is the scroll bar of the reference file display area, the second scroll bar is the scroll bar of the comparison file display area; when the first scroll bar is the scroll bar of the comparison file display area, the second scroll bar The bar is the scroll bar of the reference file display area.
例如,当用户滚动参考文件显示区域的滚动条时,客户端就会实时计算参考文件显示区域的滚动条当前已滚动的长度占滚动区域总长度的比例(如当前已滚动的长度是2cm,滚动区域总长度是10cm,比例为0.2),并根据该比例将比对文件显示区域的滚动条滚动到0.2比例处(如当前已滚动的长度是3cm,滚动区域总长度是12cm,则会滚动到2.4cm处)。For example, when the user scrolls the scroll bar of the reference file display area, the client will calculate in real time the ratio of the current scrolled length of the reference file display area to the total length of the scroll area (for example, the current scrolled length is 2cm, scrolling The total length of the region is 10cm, and the ratio is 0.2), and according to this ratio, scroll the scroll bar of the comparison file display region to the ratio of 0.2 (such as the current scrolled length is 3cm, and the total length of the scrolling region is 12cm, it will scroll to 2.4cm).
在进行差异比对前,若用户触发同步滚动,则可以直接显示同步滚动后的文本。在进行差异比对后,若用户触发同步滚动,则可以根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示当前滚动到显示区域的差异文本,对于当前滚动到显示区域的其他文本,常规显示即可,而无需突出显示。Before performing difference comparison, if the user triggers synchronous scrolling, the text after synchronous scrolling can be directly displayed. After the difference comparison is performed, if the user triggers synchronous scrolling, the difference text currently scrolled to the display area can be highlighted in the comparison file and/or the reference file according to the difference comparison result, for Other text that is currently scrolled into the display area can be displayed normally without highlighting.
本发明实施例提供的基于RPA和AI的文件比对方法,能够由RPA机器人自动将待比对的参考文件和比对文件上传至客户端,由客户端将参考文件和比对文件传输给服务器进行差异性比对,最后可以根据服务器返回的差异性比对结果在比对文件和/或参考文件中突出显示差异文本。由此可知,与现有技术中需要人工比对文件相比,本发明实施例能够利用RPA机器人自动触发客户端发送两个待比对的文件给服务器进行自动比对,从而不仅可以节省人力,让原本需要做文件比对的人员有时间去做更有价值的工作,还可以提高文件比对的效率;与现有技术需要人工标记差异性相比,本发明实施例可以直接在参考文件和/或比对文件中突出显示差异文本,从而可以提高差异文本的可读性,进而可以提高用户查找两文件间差异的效率。其中,在客户端向服务器发送参考文件和比对文件时,可以先利用OCR(Optical Character Recognition,光学字符识别)对参考文件和比对文件进行识别,再将这两个文件中包含多页文本的文件进行文本拼接,获得单页且上下文连续的参考文本以及单页且上下文连续的比对文本,最后将参考文本和比对文本发送给服务器进行差异性比对,从而可以使得服务器直接结合上下文对两个文本进行比对,而无需服务器做其他处理,进而可以提高服务器进行文件比对的效率和准确性。The file comparison method based on RPA and AI provided by the embodiment of the present invention can automatically upload the reference file and the comparison file to be compared by the RPA robot to the client, and the client can transmit the reference file and the comparison file to the server Perform a difference comparison, and finally highlight the difference text in the comparison file and/or reference file according to the difference comparison result returned by the server. It can be seen that, compared with the manual comparison of files in the prior art, the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved. Among them, when the client sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text The documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context. Finally, the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context. The two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
基于上述方法实施例,本发明的另一个实施例还提供了一种基于RPA和AI的文件比对装置,所述装置应用于客户端,如图4所示,所述装置包括:Based on the above method embodiment, another embodiment of the present invention also provides a file comparison device based on RPA and AI, the device is applied to the client, as shown in Figure 4, the device includes:
接收单元20,用于接收机器人流程自动化RPA机器人上传的参考文件和比对文件;The receiving unit 20 is used to receive the reference file and comparison file uploaded by the robotic process automation RPA robot;
发送单元22,用于将所述参考文件和所述比对文件发送给服务器;A sending unit 22, configured to send the reference file and the comparison file to a server;
所述接收单元20,还用于接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果;The receiving unit 20 is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
显示单元24,用于根据所述差异性比对结果在所述比对文件和/或所述参考文件中突 出显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。The display unit 24 is configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the The text of differences between the comparison file and the reference file, the highlighted difference text in the reference file is the text of the difference between the reference file and the comparison file.
可选的,所述差异性比对结果包括至少一条差异信息,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference The difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file, the difference position information includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
可选的,所述显示单元24,包括:Optionally, the display unit 24 includes:
转换模块,用于将所述坐标信息转换成划分DIV元素位置信息;A conversion module, configured to convert the coordinate information into position information of divided DIV elements;
显示模块,用于当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。A display module, configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
可选的,所述显示单元24还包括:Optionally, the display unit 24 also includes:
生成模块,用于针对同一条差异信息,根据在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型,生成身份证标识号ID;A generation module, for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
绑定模块,用于分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定;A binding module, used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
第一同步模块,用于当接收到基于所述参考文件或者所述比对文件触发的第一同步定位指令时,将与所述第一同步定位指令对应的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。The first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
可选的,所述显示单元24,还用于在接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果之后,根据所述差异性比对结果在预设显示区域显示差异明细,所述预设显示区域为除了参考文件显示区域和比对文件显示区域以外的区域,所述差异明细包括每条差异信息中的差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本。Optionally, the display unit 24 is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, according to the difference comparison result in a preset The display area displays the difference details. The preset display area is an area other than the reference file display area and the comparison file display area. The difference details include the difference type in each piece of difference information, the difference in the reference file text, the text of the differences in the comparison file.
可选的,所述绑定模块,还用于在分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定之后,将所述ID与所述差异明细中对应的差异信息进行绑定;Optionally, the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
所述显示单元24还包括:The display unit 24 also includes:
获取模块,用于当接收到基于所述差异明细触发的第二同步定位指令时,获取与所述第二同步定位指令对应的所述差异明细中的差异信息绑定的ID;An acquiring module, configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
第二同步模块,用于将与获取的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。The second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
可选的,所述接收单元20,还用于在根据所述差异性比对结果,在所述比对文件和/ 或所述参考文件中突出显示差异文本之前,接收针对第一滚动条的滚动指令,所述第一滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条;Optionally, the receiving unit 20 is further configured to, before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result, receive a message for the first scroll bar. A scrolling instruction, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
确定单元,用于根据所述滚动指令确定所述第一滚动条当前已滚动的长度占滚动区域总长度的比例;A determining unit, configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction;
同步滚动单元,用于根据所述比例滚动第二滚动条,以使得所述第一滚动条与所述第二滚动条同步滚动,所述第二滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,但与所述第一滚动条不同。a synchronous scrolling unit, configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
可选的,所述显示单元,用于根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示当前滚动到显示区域的差异文本。Optionally, the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
可选的,所述发送单元22,包括:Optionally, the sending unit 22 includes:
识别模块,用于利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;A recognition module, configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
拼接模块,用于当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;The splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text. When the target file is a file containing a single page of text file, obtain a single-page text from the target file as the target text, wherein, when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file , the target text is the comparison text;
发送模块,用于将所述参考文本和所述比对文本发送给所述服务器。A sending module, configured to send the reference text and the comparison text to the server.
基于上述实施例,本发明的另一个实施例还提供了一种计算设备,所述计算设备包括:Based on the above embodiments, another embodiment of the present invention also provides a computing device, the computing device includes:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本发明任一实施例所述的方法。其中,处理器与存储装置相耦合。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any embodiment of the present invention. Wherein, the processor is coupled with the storage device.
基于上述方法实施例,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本发明任一实施例所述的方法。Based on the above method embodiments, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present invention is implemented.
基于上述实施例,本发明实施例还提供了一种基于RPA和AI的文件比对系统,所述系统包括RPA机器人30、客户端32和服务器34。如图5所示,RPA机器人30可以与客户端32相互独立,如图6所示,RPA机器人30可以是客户端32的一部分。Based on the above embodiments, the embodiment of the present invention also provides a file comparison system based on RPA and AI, and the system includes an RPA robot 30 , a client 32 and a server 34 . As shown in FIG. 5 , the RPA robot 30 may be independent from the client 32 , and as shown in FIG. 6 , the RPA robot 30 may be a part of the client 32 .
所述RPA机器人30,用于登录所述客户端32,并将参考文件和比对文件上传至所述客户端32,触发所述客户端32将所述参考文件和所述比对文件发送给服务器34进行差异性比对;The RPA robot 30 is configured to log into the client 32, upload the reference file and the comparison file to the client 32, and trigger the client 32 to send the reference file and the comparison file to The server 34 performs a difference comparison;
所述客户端32,用于接收RPA机器人上传的参考文件和比对文件,将所述参考文件和所述比对文件发送给服务器;The client 32 is configured to receive the reference file and the comparison file uploaded by the RPA robot, and send the reference file and the comparison file to the server;
所述服务器34,用于根据预设比对算法对所述参考文件和所述比对文件进行差异性比对,获得所述比对文件相对于所述参考文件的差异性比对结果;The server 34 is configured to compare the difference between the reference file and the comparison file according to a preset comparison algorithm, and obtain a difference comparison result of the comparison file relative to the reference file;
所述客户端32,还用于接收所述服务器发送的差异性比对结果,根据所述差异性比 对结果,在所述比对文件和/或所述参考文件中突出显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。The client 32 is also configured to receive the difference comparison result sent by the server, and according to the difference comparison result, highlight the difference text in the comparison file and/or the reference file, wherein , the highlighted difference text in the comparison file is the difference text between the comparison file and the reference file, and the highlighted difference text in the reference file is the difference text between the reference file and the Compare text with differences between files.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
在本发明的各种实施例中,应理解,上述各过程的序号的大小并不意味着执行顺序的必然先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not necessarily mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be used in the implementation of the present invention. The implementation of the examples constitutes no limitation.
在本发明所提供的实施例中,应理解,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。In the embodiments provided by the present invention, it should be understood that "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B based on A does not mean determining B only based on A, and B can also be determined based on A and/or other information.
另外,在本发明各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
上述集成的单元若以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可获取的存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或者部分,可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干请求用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本发明的各个实施例上述方法的部分或全部步骤。If the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute some or all of the steps of the above-mentioned methods in various embodiments of the present invention.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory,ROM)、随机存储器(Random Access Memory,RAM)、可编程只读存储器(Programmable Read-only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、一次可编程只读存储器(One-time Programmable Read-Only Memory,OTPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium includes read-only Memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), programmable read-only memory (Programmable Read-only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.
本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of an embodiment, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present invention.
本领域普通技术人员可以理解:实施例中的装置中的模块可以按照实施例描述分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。Those of ordinary skill in the art can understand that: the modules in the device in the embodiment may be distributed in the device in the embodiment according to the description in the embodiment, or may be changed and located in one or more devices different from the embodiment. The modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参 照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (15)

  1. 一种基于RPA和AI的文件比对方法,所述方法应用于客户端,其特征在于,所述方法包括:A method for comparing files based on RPA and AI, said method being applied to a client, characterized in that said method comprises:
    S1、接收机器人流程自动化RPA机器人上传的参考文件和比对文件;S1. Receive the reference files and comparison files uploaded by the robot process automation RPA robot;
    S2、将所述参考文件和所述比对文件发送给服务器;S2. Send the reference file and the comparison file to a server;
    S3、接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果;S3. Receive the difference comparison result of the comparison file relative to the reference file sent by the server;
    S4、根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。S4. According to the difference comparison result, highlight the difference text in the comparison file and/or the reference file, wherein the difference text highlighted in the comparison file is the comparison file For texts that are different from the reference file, the text that is highlighted in the reference file is the text that is different between the reference file and the comparison file.
  2. 根据权利要求1所述的方法,其特征在于,所述差异性比对结果包括至少一条差异信息,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。The method according to claim 1, wherein the difference comparison result includes at least one piece of difference information, and each piece of difference information includes a difference type, a difference text in the reference file, and a difference text in the comparison file. The difference text in , the difference position information of the difference text in the reference file, and the difference position information of the difference text in the comparison file, the difference position information includes the page identification of the page to which the difference text belongs, the difference The coordinate information of the text on the page it belongs to.
  3. 根据权利要求2所述的方法,其特征在于,所述S4包括:The method according to claim 2, wherein said S4 comprises:
    S41、将所述坐标信息转换成划分DIV元素位置信息;S41. Convert the coordinate information into position information of divided DIV elements;
    S42、当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。S42. When the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the page identification corresponding to the position information of the DIV element, perform an operation on all pages indicated by the page identification Highlight the difference text at the position information of the above DIV element.
  4. 根据权利要求3所述的方法,其特征在于,所述S4还包括:The method according to claim 3, wherein said S4 further comprises:
    S43、针对同一条差异信息,根据在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息以及差异类型,生成身份证标识号ID,并分别将所述ID与在所述参考文件中差异文本的DIV元素位置信息、在所述比对文件中差异文本的DIV元素位置信息进行绑定;S43. For the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID, and respectively Binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
    S44、当接收到基于所述参考文件或者所述比对文件触发的第一同步定位指令时,将与所述第一同步定位指令对应的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。S44. When the first synchronous positioning instruction triggered based on the reference file or the comparison file is received, synchronize the difference text at the position information of all DIV elements bound to the ID corresponding to the first synchronous positioning instruction to highlight.
  5. 根据权利要求4所述的方法,其特征在于,在所述S3之后,所述方法还包括:The method according to claim 4, characterized in that, after the S3, the method further comprises:
    S5、根据所述差异性比对结果在预设显示区域显示差异明细,所述预设显示区域为除了参考文件显示区域和比对文件显示区域以外的区域,所述差异明细包括每条差异信息中 的差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本。S5. Display the difference details in the preset display area according to the difference comparison result, the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include each piece of difference information The type of difference in , the text of the difference in the reference file, the text of the difference in the comparison file.
  6. 根据权利要求5所述的方法,其特征在于,在所述S43之后,所述方法还包括:The method according to claim 5, characterized in that, after the S43, the method further comprises:
    S45、将所述ID与所述差异明细中对应的差异信息进行绑定;S45. Bind the ID with the corresponding difference information in the difference details;
    S46、当接收到基于所述差异明细触发的第二同步定位指令时,获取与所述第二同步定位指令对应的所述差异明细中的差异信息绑定的ID;S46. When receiving a second synchronous positioning instruction triggered based on the difference details, acquire an ID bound to the difference information in the difference details corresponding to the second synchronous positioning instruction;
    S47、将与获取的ID绑定的所有DIV元素位置信息处的差异文本同步进行突出显示。S47. Synchronously highlight the difference text at the position information of all DIV elements bound to the acquired ID.
  7. 根据权利要求1所述的方法,其特征在于,在所述S4之前,所述方法还包括:The method according to claim 1, wherein, before the S4, the method further comprises:
    S6、接收针对第一滚动条的滚动指令,所述第一滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条;S6. Receive a scrolling instruction for the first scroll bar, where the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
    S7、根据所述滚动指令确定所述第一滚动条当前已滚动的长度占滚动区域总长度的比例;S7. Determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction;
    S8、根据所述比例滚动第二滚动条,以使得所述第一滚动条与所述第二滚动条同步滚动,所述第二滚动条包括参考文件显示区域的滚动条或者比对文件显示区域的滚动条,但与所述第一滚动条不同。S8. Scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes the scroll bar of the reference file display area or the comparison file display area , but not the same as the first scrollbar.
  8. 根据权利要求7所述的方法,其特征在于,所述S4包括:The method according to claim 7, wherein said S4 comprises:
    根据所述差异性比对结果,在所述比对文件和/或所述参考文件中突出显示当前滚动到显示区域的差异文本。According to the difference comparison result, the difference text currently scrolled to the display area is highlighted in the comparison file and/or the reference file.
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述S2包括:The method according to any one of claims 1-8, wherein said S2 comprises:
    S21、利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;S21. Using optical character recognition (OCR) to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of text of the comparison document;
    S22、当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;S22. When the target file is a file containing multiple pages of text, splicing the multiple pages of text of the target file into one page of text with continuous context to obtain the target text; when the target file is a file containing a single page of text, Obtain a single page of text from the target file as the target text, wherein, when the target file is the reference file, the target text is a reference text, and when the target file is the comparison file, the target text is The target text mentioned above is the comparison text;
    S23、将所述参考文本和所述比对文本发送给所述服务器。S23. Send the reference text and the comparison text to the server.
  10. 一种基于RPA和AI的文件比对装置,所述装置应用于客户端,其特征在于,所述装置包括:A file comparison device based on RPA and AI, the device is applied to the client, characterized in that the device includes:
    接收单元,用于接收机器人流程自动化RPA机器人上传的参考文件和比对文件;A receiving unit, configured to receive reference files and comparison files uploaded by robotic process automation RPA robots;
    发送单元,用于将所述参考文件和所述比对文件发送给服务器;a sending unit, configured to send the reference file and the comparison file to a server;
    所述接收单元,还用于接收所述服务器发送的所述比对文件相对于所述参考文件的差异性比对结果;The receiving unit is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
    显示单元,用于根据所述差异性比对结果在所述比对文件和/或所述参考文件中突出 显示差异文本,其中,在所述比对文件中突出显示的差异文本为所述比对文件相对于所述参考文件存在差异的文本,在所述参考文件中突出显示的差异文本为所述参考文件相对于所述比对文件存在差异的文本。A display unit, configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the comparison For texts that are different between the file and the reference file, the highlighted text of the difference in the reference file is the text that is different between the reference file and the comparison file.
  11. 根据权利要求10所述的装置,其特征在于,所述差异性比对结果包括至少一条差异信息,每条差异信息包括差异类型、在所述参考文件中的差异文本、在所述比对文件中的差异文本、在所述参考文件中的差异文本的差异位置信息,以及在所述比对文件中的差异文本的差异位置信息,所述差异位置信息包括差异文本所属分页的分页标识、差异文本在所属分页的坐标信息。The device according to claim 10, wherein the difference comparison result includes at least one piece of difference information, and each piece of difference information includes a difference type, a difference text in the reference file, and a difference text in the comparison file. The difference text in , the difference position information of the difference text in the reference file, and the difference position information of the difference text in the comparison file, the difference position information includes the page identification of the page to which the difference text belongs, the difference The coordinate information of the text on the page it belongs to.
  12. 根据权利要求11所述的装置,其特征在于,所述显示单元,包括:The device according to claim 11, wherein the display unit comprises:
    转换模块,用于将所述坐标信息转换成划分DIV元素位置信息;A conversion module, configured to convert the coordinate information into position information of divided DIV elements;
    显示模块,用于当所述DIV元素位置信息进入所属文件的显示区域时,根据所述DIV元素位置信息对应的差异类型和所述DIV元素位置信息对应的分页标识,对所述分页标识指示的分页中所述DIV元素位置信息处的差异文本进行突出显示。A display module, configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
  13. 根据权利要求10-12中任一项所述的装置,其特征在于,所述发送单元,包括:The device according to any one of claims 10-12, wherein the sending unit includes:
    识别模块,用于利用光学字符识别OCR对所述参考文件和所述比对文件进行识别,获得所述参考文件的至少一页文本以及所述比对文件的至少一页文本;A recognition module, configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
    拼接模块,用于当目标文件为包含多页文本的文件时,将所述目标文件的多页文本拼接为上下文连续的一页文本,获得目标文本,当所述目标文件为包含单页文本的文件时,从所述目标文件中获取单页文本作为目标文本,其中,当所述目标文件为所述参考文件时,所述目标文本为参考文本,当所述目标文件为所述比对文件时,所述目标文本为比对文本;The splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text. When the target file is a file containing a single page of text file, obtain a single-page text from the target file as the target text, wherein, when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file , the target text is the comparison text;
    发送模块,用于将所述参考文本和所述比对文本发送给所述服务器。A sending module, configured to send the reference text and the comparison text to the server.
  14. 一种计算设备,其特征在于,所述计算设备包括:A computing device, characterized in that the computing device includes:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-9.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-9中任一所述的方法。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-9 is realized.
PCT/CN2021/131627 2021-09-27 2021-11-19 File comparison method and apparatus based on rpa and ai, device, and storage medium WO2023045053A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111138084.1A CN113836092A (en) 2021-09-27 2021-09-27 File comparison method, device, equipment and storage medium based on RPA and AI
CN202111138084.1 2021-09-27

Publications (1)

Publication Number Publication Date
WO2023045053A1 true WO2023045053A1 (en) 2023-03-30

Family

ID=78970974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131627 WO2023045053A1 (en) 2021-09-27 2021-11-19 File comparison method and apparatus based on rpa and ai, device, and storage medium

Country Status (2)

Country Link
CN (1) CN113836092A (en)
WO (1) WO2023045053A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115450B (en) * 2022-08-30 2022-11-29 平安银行股份有限公司 Method and device for establishing case for dispute of Unionpay

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN111753517A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Document comparison method, device, equipment and medium based on RPA and AI
US20210109717A1 (en) * 2019-10-14 2021-04-15 UiPath Inc. Providing Image and Text Data for Automatic Target Selection in Robotic Process Automation
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment
CN113836096A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment, medium and system based on RPA and AI

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470572B (en) * 2007-12-29 2010-09-01 英业达股份有限公司 Optimum display system and method for context progress
IL235565B (en) * 2014-11-06 2019-06-30 Kolton Achiav Location based optical character recognition (ocr)
CN106528587B (en) * 2016-09-12 2020-12-08 腾讯科技(深圳)有限公司 Page display method and device in composite webpage system
CN110162509A (en) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 File comparison method, device, computer equipment and storage medium
CN111914597B (en) * 2019-05-09 2024-03-15 杭州睿琪软件有限公司 Document comparison identification method and device, electronic equipment and readable storage medium
CN111460763A (en) * 2020-03-02 2020-07-28 南京南瑞继保电气有限公司 Method, device and equipment for marking file differences and computer-readable storage medium
CN112084748A (en) * 2020-09-19 2020-12-15 神思电子技术股份有限公司 Text comparison method
CN112882947A (en) * 2021-03-15 2021-06-01 深圳市腾讯信息技术有限公司 Interface test method, device, equipment and storage medium
CN113031887A (en) * 2021-04-08 2021-06-25 成都微视联软件技术有限公司 Method for supporting various headers and subsection printing in html file printing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
US20210109717A1 (en) * 2019-10-14 2021-04-15 UiPath Inc. Providing Image and Text Data for Automatic Target Selection in Robotic Process Automation
CN111753517A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Document comparison method, device, equipment and medium based on RPA and AI
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment
CN113836096A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment, medium and system based on RPA and AI

Also Published As

Publication number Publication date
CN113836092A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
US9514108B1 (en) Automatic reference note generator
US7966352B2 (en) Context harvesting from selected content
US8091022B2 (en) Online learning monitor
US9880989B1 (en) Document annotation service
CA2684822C (en) Data transformation based on a technical design document
US7594165B2 (en) Embedded ad hoc browser web to spreadsheet conversion control
US20100324887A1 (en) System and method of online user-cycled web page vision instant machine translation
US20080282160A1 (en) Designated screen capturing and automatic image exporting
JP2018501551A (en) Formula processing method, apparatus, device, and program
US11423212B2 (en) Method and system for labeling and organizing data for summarizing and referencing content via a communication network
WO2023045053A1 (en) File comparison method and apparatus based on rpa and ai, device, and storage medium
JP6840597B2 (en) Search result summarizing device, program and method
AU2017394778A1 (en) Facilitated user interaction
CN111797297B (en) Page data processing method and device, computer equipment and storage medium
US7945601B2 (en) Reporting of approval workflow transactions using XMLP
WO2023045056A1 (en) Document comparison method, apparatus and system based on rpa and ai, and device and medium
US20080155501A1 (en) System and Method for Revising an Electronic Draft
US9959577B1 (en) Tax return preparation automatic document capture and parsing system
US7788283B2 (en) On demand data proxy
JP2007233698A (en) Web display terminal and annotation processing module
US20150256493A1 (en) System and Method for Document Processing
CN113419711A (en) Page guiding method and device, electronic equipment and storage medium
US20190012400A1 (en) Information processing apparatus and non-transitory computer readable medium
KR101786019B1 (en) Intelligent auto-completion method and apparatus sentence
CN112364270B (en) Webpage element storage method, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21958166

Country of ref document: EP

Kind code of ref document: A1