WO2023045053A1 - Procédé et appareil de comparaison de fichiers reposant sur rpa et ai, dispositif et support de stockage - Google Patents

Procédé et appareil de comparaison de fichiers reposant sur rpa et ai, dispositif et support de stockage Download PDF

Info

Publication number
WO2023045053A1
WO2023045053A1 PCT/CN2021/131627 CN2021131627W WO2023045053A1 WO 2023045053 A1 WO2023045053 A1 WO 2023045053A1 CN 2021131627 W CN2021131627 W CN 2021131627W WO 2023045053 A1 WO2023045053 A1 WO 2023045053A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
text
difference
comparison
position information
Prior art date
Application number
PCT/CN2021/131627
Other languages
English (en)
Chinese (zh)
Inventor
赵鹏
汪冠春
胡一川
褚瑞
李玮
Original Assignee
北京来也网络科技有限公司
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京来也网络科技有限公司, 来也科技(北京)有限公司 filed Critical 北京来也网络科技有限公司
Publication of WO2023045053A1 publication Critical patent/WO2023045053A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • Embodiments of the present invention relate to the technical field of process automation, and in particular, relate to a method, device, device, and storage medium for comparing files based on RPA and AI.
  • RPA Robot Process Automation, robotic process automation
  • Robot Process Automation is to simulate the operation of human beings on the computer through specific "robot software", and automatically execute process tasks according to the rules.
  • AI Artificial Intelligence
  • RPA has unique advantages: low-code, non-intrusive.
  • Low-code means that RPA can be operated without a high IT level, and business personnel who do not understand programming can also develop processes; non-intrusive means that RPA can simulate human operations without opening interfaces with software systems.
  • traditional RPA has certain limitations: it can only be based on fixed rules, and its application scenarios are limited. With the continuous development of AI technology, the deep integration of RPA and AI overcomes the limitations of traditional RPA.
  • RPA+AI Hand work+Head work is greatly changing the value of labor.
  • Embodiments of the present invention provide a file comparison method, device, device, and storage medium based on RPA and AI, which can not only realize the automation of file comparison, but also highlight the differences between two files, thereby improving the user's search for file differences. s efficiency.
  • the implementation of the present invention provides a method for comparing files based on RPA and AI, the method is applied to the client, and the method includes:
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the S4 includes:
  • the S4 also includes:
  • the DIV element position information and the difference type of the difference text in the comparison file generate an ID card identification number ID, and respectively Binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
  • the method further includes:
  • the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include each piece of difference information
  • the type of difference in the text of the difference in the reference file, the text of the difference in the comparison file.
  • the method further includes:
  • the method further includes:
  • the S4 includes:
  • the difference text currently scrolled to the display area is highlighted in the comparison file and/or the reference file.
  • the S2 includes:
  • OCR optical character recognition
  • the target file is a file containing multiple pages of text
  • the target file is a file containing a single page of text
  • the embodiment of the present invention provides a file comparison device based on RPA and AI, the device is applied to the client, and the device includes:
  • a receiving unit configured to receive reference files and comparison files uploaded by robotic process automation RPA robots;
  • a sending unit configured to send the reference file and the comparison file to a server
  • the receiving unit is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
  • a display unit configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the comparison
  • the highlighted text of the difference in the reference file is the text that is different between the reference file and the comparison file.
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the display unit includes:
  • a conversion module configured to convert the coordinate information into position information of divided DIV elements
  • a display module configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
  • the display unit also includes:
  • a generation module for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
  • a binding module used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
  • the first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
  • the display unit is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, display in preset according to the difference comparison result
  • the area displays the difference details.
  • the preset display area is an area other than the reference file display area and the comparison file display area.
  • the difference details include the difference type in each piece of difference information, and the difference text in the reference file , the difference text in the comparison file.
  • the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
  • the display unit also includes:
  • An acquiring module configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
  • the second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
  • the receiving unit is further configured to receive scrolling for the first scroll bar before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result Instructions, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
  • a determining unit configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction
  • a synchronous scrolling unit configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
  • the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
  • the sending unit includes:
  • a recognition module configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
  • OCR optical character recognition
  • the splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text.
  • the target file is a file containing a single page of text file
  • a sending module configured to send the reference text and the comparison text to the server.
  • an embodiment of the present invention provides a computing device, and the computing device includes:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in the first aspect.
  • an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in the first aspect is implemented.
  • the RPA and AI-based file comparison method, device, equipment, and storage medium provided by the embodiments of the present invention can automatically upload the reference file and the comparison file to the client by the RPA robot, and the reference file can be uploaded by the client
  • the comparison file and the comparison file are transmitted to the server for difference comparison, and finally the difference text can be highlighted in the comparison file and/or reference file according to the difference comparison result returned by the server.
  • the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved.
  • the client when it sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text
  • OCR Optical Character Recognition
  • the two files contain multiple pages of text
  • the documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context.
  • the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context.
  • the two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
  • the user can trigger a synchronous positioning command through the reference file display area, the comparison file display area or the difference detail display area, so that the client can simultaneously highlight the same difference information, thereby improving the efficiency of users viewing the difference text.
  • the user can drag the scroll bar of the reference file display area or the comparison file display area to make the client scroll synchronously for these two display areas, thereby improving the efficiency of users viewing text.
  • Fig. 1 is the flowchart of a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention
  • Fig. 2 is an example diagram showing a difference comparison result provided by an embodiment of the present invention.
  • Fig. 3 is another example diagram showing difference comparison results provided by the embodiment of the present invention.
  • FIG. 4 is a block diagram of a file comparison device based on RPA and AI provided by an embodiment of the present invention
  • Fig. 5 is a kind of file comparison system architecture diagram based on RPA and AI provided by the embodiment of the present invention.
  • Fig. 6 is an architecture diagram of another file comparison system based on RPA and AI provided by an embodiment of the present invention.
  • the RPA Robot Process Automation
  • AI Artificial Intelligence, artificial intelligence
  • the embodiment of the present invention provides a combination of RPA and AI technologies to automatically compare files, which not only saves manpower, improves the efficiency of file comparison, but also highlights the differences between the two files, improving the Efficiency for users to find differences in files.
  • reference file refers to a file that is used as a reference when performing a difference comparison
  • comparison file refers to a file that is used as a reference in the two files that are compared.
  • version of the reference document is often lower than that of the comparison document.
  • the reference document and comparison document can be documents in any field, such as contract documents, financial documents, program documents, etc.
  • multi-page document refers to a document with text content greater than or equal to two pages
  • multi-page text refers to a document greater than or equal to two pages
  • OCR refers to Optical Character Recognition (Optical Character Recognition), specifically refers to the electronic equipment to check the characters printed on the paper, determine its shape by detecting dark and bright patterns, and then use the character Recognition method
  • Optical Character Recognition Optical Character Recognition
  • the RPA robot can use OCR technology to convert the text in the paper document into a black and white dot matrix image file, and then the client can use OCR technology to identify the text content contained in the image file from the image file , it is also possible to use OCR technology to obtain text content from paper documents based on RPA robots, generate a text file (that is, an editable file) containing text content, and then the client directly extracts text content from the text file.
  • OCR technology to convert the text in the paper document into a black and white dot matrix image file
  • the client can use OCR technology to identify the text content contained in the image file from the image file
  • OCR technology it is also possible to use OCR technology to obtain text content from paper documents based on RPA robots, generate a text file (that is, an editable file) containing text content, and then the client directly extracts text content from the text file.
  • client refers to the front-end of the business system with file comparison requirements
  • server refers to the back-end of the business system with file comparison requirements
  • client can be the application software corresponding to the business system, or it can be a browser, so that the RPA robot can access the website of the business system through the browser.
  • RPA robot can be integrated in the client, can also be embedded in the client in the form of a plug-in, or can be independent of the client, as long as the RPA robot can automatically access the client. The specific form is not limited.
  • NLP refers to natural language processing (Natural Language Processing), which takes language as an object and uses computer technology to analyze, understand and process natural language, that is, a computer is used as A powerful tool for language research. With the support of computers, it conducts quantitative research on language information and provides language descriptions that can be used jointly by humans and computers.
  • splicing refers to connecting the content to be spliced together without changing the original content.
  • the term "preset comparison algorithm” refers to a specific comparison method for determining the difference between the comparison text and the reference text, and the reference text and the comparison text can be compared according to the preset comparison unit The comparison is performed in batches until the comparison is completed.
  • the term “preset comparison unit” refers to the size of the text to be compared each time, which may be determined according to the actual situation, and may be a phrase, a sentence, or a paragraph.
  • the term “difference comparison” refers to comparing the differences between the reference text and the comparison text.
  • the term “difference comparison result” refers to a result including at least one piece of difference information obtained after making a difference comparison between the reference text and the comparison text, and each piece of difference information includes the type of difference, the difference in the reference file Text, difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, the difference position information includes difference The page identification of the page to which the text belongs, and the coordinate information of the page to which the difference text belongs.
  • the term “difference type” is used to characterize the category of differences, mainly including content deletion, content addition and content modification.
  • page identification is used to indicate which page the current page is located in the entire file.
  • coordinate information a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal axis and vertical axis being horizontally right and vertically downward respectively, so that Generate corresponding coordinates for each character in the page.
  • difference text refers to the text content in the current file that is different from another file.
  • highlighting is a display method that can clearly distinguish the difference text from other texts.
  • the highlighting method includes but is not limited to a combination of one or more of the following: bold font , Change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc.
  • authentication refers to verifying whether the client that sends the reference file and the comparison file has the authority to perform file comparison, specifically, it can be realized by verifying whether the user information of the client meets the authority requirement authentication.
  • DIV element position information refers to the position information of the DIV (DIVision, division) element in the network interface
  • DIV element is used for HTML (Hyper Text Markup Language, hypertext markup language)
  • HTML Hyper Text Markup Language, hypertext markup language
  • binding refers to establishing a mapping relationship of at least two parameters to be bound, so that one parameter can be used to find another parameter.
  • difference details is specific descriptive information for each difference, and the difference details include the difference type in each piece of difference information, the difference text in the reference file, and the difference text in the comparison file. diff text.
  • synchronous positioning instruction is an instruction indicating to display the difference text involved in the same piece of difference information synchronously.
  • synchronous scrolling refers to a scrolling manner that keeps the scrolling progress of at least two scroll bars consistent.
  • Fig. 1 is a kind of file comparison method based on RPA and AI provided by the embodiment of the present invention, this method is mainly applied to the client, specifically includes:
  • the embodiment of the present invention can configure the RPA program in the electronic device that can log in to the client (it can be integrated or embedded in the client, or it can be independent of the client), so that the electronic device can simulate the RPA program according to the rules set in the RPA program
  • the user's mouse and keyboard operations automatically log in to the client, and trigger the client to generate a file comparison request including reference files and comparison files by accessing the client, and send the file comparison request to the server so that the server can compare the reference files and comparison files Compare differences.
  • the client when logging in to the client, the client can pop up a login interface containing a verification code image.
  • the RPA robot can perform OCR recognition on the verification code image, obtain the verification code content in the verification code image, and store the verification code content Enter it into the corresponding edit box to successfully log in to the client.
  • the reference file and comparison file can be stored in the client, or in other storage space of the electronic device, or can be a paper file.
  • the RPA robot can search for reference files and comparison files from the other storage spaces, and upload the reference files and comparison files to the client, for example, by clicking the upload button.
  • the two files are uploaded to the client, and the two files can also be dragged to the designated area by dragging and dropping to realize the file upload, or other upload methods can be used.
  • the RPA robot can use OCR technology to convert the paper file into an image file or a text file (that is, an editable file composed of the text content in the paper file) , and then use the above method to upload to the client.
  • the client After the client receives the reference file and comparison file uploaded by RPA, it can render the reference file and comparison file to show the uploaded file to the user.
  • the reference file and/or comparison file is a word file
  • the client when converting a word file into a PDF file, the client may send the conversion operation to the server, and then the server may feed back the PDF file to the client for rendering.
  • the client After receiving the reference file and comparison file uploaded by the RPA robot, the client can receive the file comparison command triggered by the RPA robot, and then directly generate a file comparison request including the reference file and the comparison file according to the file comparison command, and send it to the server Send the file comparison request so that the server can compare the differences between the reference file and the comparison file.
  • the server After the server receives the reference file and the comparison file, it often needs to identify the text in the two files before performing a difference comparison. If there are many clients sending file comparison requests to the server, it will cause the server to perform The efficiency of file comparison is reduced.
  • the client can first use OCR to identify the reference document and the comparison document, and obtain at least one page of text of the reference document and at least one page of the comparison document. text, and then send the recognized text to the server for difference comparison.
  • the reference document includes two pages of text, and the comparison document adds a page of text between the first page of text and the second page of text in the reference document, thereby forming three pages of text.
  • the single-page comparison method is used to compare the two
  • the comparison result is that the text on the second page of the reference file is different from the text on the second page of the comparison file, and the reference file does not have the text on the third page, so the comparison result is the text on the third page of the comparison file. It does not exist in the reference documents, that is to say, the single-page comparison method will lead to the overall comparison result that the two documents are not the same except for the same text on the first page.
  • the client uses OCR to identify the reference file and the comparison file, and obtains at least one page of text in the reference file and at least one page of text in the comparison file, first Perform text splicing, and then send the spliced text to the server.
  • the target file when the target file is a file containing multiple pages of text, the multiple pages of text of the target file are spliced into one page of text with continuous context to obtain the target text; when the target file is a file containing a single page of text , obtaining a single page of text from the target file as the target text, wherein the target file includes a reference file or a comparison file, when the target file is the reference file, the target text is a reference text, when When the target file is the comparison file, the target text is the comparison text; sending the reference text and the comparison text to the server.
  • the continuous context refers to maintaining the sequence of the original text.
  • the specific method of splicing multiple pages of text in a reference file or a comparison file into one page of text with continuous context can be to splice multiple pages of text in sequence according to the page order of the reference file or comparison file, so as to obtain a page with continuous context text.
  • the server can authenticate the user information of the client to verify whether the user has the file comparison authority.
  • the client sends the reference file and the comparison file to the server, it can also carry the user information of the client, so that the server can first authenticate the client according to the user information, and then verify the reference file and the comparison file when the authentication is passed. Compare files for difference comparison.
  • the user information may be a client account, may be a mobile phone number bound to the client account, may also be a user level or other information, and the implementation of the present invention does not limit the specific content of the user information, which may be determined according to specific circumstances.
  • the server can compare the difference between the reference file and the comparison file according to a preset comparison algorithm. Specifically, the reference text and the comparison text may be compared according to a preset comparison unit, and comparison sub-results for each preset comparison unit are obtained.
  • the corresponding comparison sub-result is determined as the content Delete; if it is determined that the comparison sub-text being compared does not exist in the reference text, then determine the corresponding comparison sub-result as content addition.
  • the differences between two texts should include not only the same content, content deletion and content addition, but also content modification.
  • the size of the preset comparison unit can be determined according to the actual situation, and can be a phrase, a sentence, a paragraph, and the like.
  • NLP technology can also be used to perform semantic analysis on the reference subtext and the comparison subtext.
  • the embodiment of the present invention can also support self-defined filtering rules, ignoring meaningless differences, that is, when there is a difference that satisfies the preset filtering rules in the difference between the reference subtext and the comparison subtext, ignore the difference that satisfies the preset filtering rules.
  • Set the difference in filtering rules For example, it can be set that the presence or absence of the particle "of" in a sentence does not affect the comparison result.
  • the server sends the difference comparison result to the client, it can also send the ignored difference, so that the client can display the ignored difference to the user.
  • each piece of difference information includes the difference type, the difference text in the reference file, the difference text in the comparison file, the difference position information of the difference text in the reference file, and the difference text in the comparison file.
  • the difference position information includes the page identification of the page to which the difference text belongs and the coordinate information of the difference text on the page to which it belongs.
  • a comparison sub-result corresponds to a piece of difference information, and the difference types include content addition, content deletion, and content modification.
  • the pagination mark is used to indicate which page the current paging is located in the entire file.
  • a coordinate system can be established for each page, with the first character position of each page as the origin, and the horizontal and vertical axes to the right and vertical respectively, so that the Each character generates corresponding coordinates.
  • the server can add the task of comparing the reference file and the comparison file to the task queue, and compare the task with the comparison task.
  • the task state of the paired task is stored in the task database, and when the task state of the comparison task changes, the task state in the task database is updated in time.
  • the client can receive the comparison task status query command triggered by the RPA robot, and send the comparison task status query command to the server, so that the server can query the comparison task corresponding to the comparison task status query command from the task database , and feed back the queried task status to the client.
  • the comparison task when the comparison task is not being executed, the task status may be unprocessed, when the comparison task is being executed, the task status may be processing, and when the comparison task is completed, the task status may be completed.
  • the server can actively feed back the difference comparison result to the client, and can also passively feed back the difference comparison result to the client.
  • the specific implementation method of passively feeding back the difference comparison results to the client can be as follows: the client receives the comparison result query command triggered by the RPA robot, and sends the comparison result query command to the server, and the server queries the command according to the comparison result Send the corresponding difference comparison results to the client.
  • the specific implementation of the RPA robot triggering the client to send a comparison result query command or a comparison task status query command includes, but is not limited to, the RPA robot clicks the comparison result query button or the comparison task status query button on the client. Trigger the client to generate and send corresponding instructions.
  • the highlighted difference text in the comparison file is the text of the difference between the comparison file and the reference file
  • the highlighted difference text in the reference file is the text of the difference between the reference file and the reference file.
  • the way of highlighting includes but is not limited to a combination of one or more of the following: bold font, change font color, increase font background color, highlight font, increase font, change to italic, add underline, add strikethrough, etc.
  • different difference types may be highlighted in the same or different ways.
  • the specific implementation of this step may be: converting the coordinate information into DIV element position information; when the DIV element position information enters the display area of the file to which it belongs, according to the difference type corresponding to the DIV element position information and the The page identification corresponding to the position information of the DIV element highlights the difference text at the position information of the DIV element in the page indicated by the page identification.
  • DIV is a positioning technology in cascading style sheets
  • the DIV element is an element used to provide structure and background for block-level content in an HTML document.
  • the difference text is highlighted in the comparison file and/or the reference file according to the difference comparison result
  • the difference type contained in the current difference information is content deletion
  • it can be Only the deleted text is highlighted in the reference file, or the text before the content deletion and the text retained after the content deletion can be highlighted separately, that is, the difference text contained in the difference information in the reference file and in the comparison file
  • the difference files are highlighted. If the difference type contained in the current difference information is content increase, you can only highlight the added content in the comparison file, or you can highlight the text before the content increase and the text after the content increase respectively, that is, the difference information includes Both the difference text in the reference file and the difference files in the comparison file are highlighted. If the difference type contained in the current difference information is content modification, the text before content modification and the text after content modification can be highlighted, that is, the difference text contained in the difference information in the reference file and the text in the comparison file Difference files are highlighted.
  • FIG. 2 is part of the text content of the reference file and the comparison file, and the difference text can be directly highlighted in the reference file and the comparison file, and the user can browse by dragging the scroll bar of the reference file and the comparison file.
  • the bold and underlined text refers to the modified text
  • the italic and enlarged text refers to the text added in the comparison file
  • the strikethrough text refers to the deleted text in the comparison file.
  • the embodiment of the present invention can focus on the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file, and the difference Type, generate ID (Identity Document, ID card identification number), and carry out described ID and the DIV element position information of difference text in described reference file, the DIV element position information of difference text in described comparison file respectively Binding, when receiving the first synchronous positioning instruction triggered based on the reference file or the comparison file, bind the ID corresponding to the first synchronous positioning instruction to the difference text at the position information of all DIV elements Highlighting is done synchronously.
  • the specific implementation of generating the ID includes but is not limited to: according to the preset order, the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type Perform concatenation to obtain a string.
  • different difference types may be represented by different characters, for example, "content deletion", “content addition” and “content modification” may be represented by "1", "2" and "3” in sequence.
  • the first synchronous positioning instruction is an instruction generated when the user clicks on the display area of the reference file or the comparison file. When the client receives the first synchronous positioning instruction, it will activate the corresponding ID.
  • the component corresponding to the reference file or the comparison file it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted display, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display.
  • the embodiment of the present invention can also display the difference details in the preset display area according to the difference comparison results, so
  • the preset display area is an area other than the reference file display area and the comparison file display area, and the difference details include the difference type in each difference information, the difference text in the reference file, and the difference text in the comparison file.
  • the diff text in the file can also be summarized, and the summary results can be displayed in display areas other than the reference file display area, the comparison file display area, and the preset display area.
  • the summary result includes the total number of difference information and the page identification where the difference information is located. For example, the summary result is "After comparison, it is found that there are differences on pages 1, 3, 5, and 8 of the reference document, and there are 20 differences in total between the two documents".
  • the client when displaying the difference, not only highlights the difference text in the reference file and/or the comparison file, but also displays the comparison result on the right.
  • the upper part of the comparison result is the overall comparison result (ie, the summary result), and the lower part is the detailed comparison result (ie, the difference details). Users can browse the difference details by dragging the scroll bar in the detailed comparison result display area.
  • the preset display area is independent from the reference file display area and the comparison file display area, when the user views the preset display area, the contents displayed in the reference file display area and the comparison file display area will not change. In this case, if the user wants to view the specific content in the reference file and the comparison file in combination with the difference details, the user needs to drag the scroll bars of the reference file display area and the comparison file display area respectively, and the operation is cumbersome .
  • the embodiment of the present invention can bind the ID with the corresponding difference information in the difference details; when receiving the second synchronous positioning instruction triggered based on the difference details Obtaining the IDs bound to the difference information in the difference details corresponding to the second synchronous positioning instruction; synchronously highlighting the difference texts at all DIV element position information bound to the obtained IDs.
  • the second synchronous positioning instruction is an instruction generated when the user clicks on a preset display area.
  • the client When the client receives the second synchronous positioning command, it will activate the ID bound to the difference information in the difference details corresponding to the second synchronous positioning command.
  • the component corresponding to the reference file or comparison file it will judge whether the activated ID is the same as the ID contained in itself. If it contains the same ID, the difference text at the position information of the DIV element corresponding to the ID can be highlighted is displayed, and when the position information of the DIV element is not in the display area, the position information of the DIV element will be scrolled to the display area for display.
  • the embodiment of the present invention may receive a scrolling command for the first scroll bar; determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scrolling area according to the scrolling command; Scrolling the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously. That is to say, for the first scroll bar, it will only scroll along with the user's dragging, but no synchronous scrolling will be performed; for the second scroll bar, it will scroll along with the scrolling of the first scroll bar.
  • the first scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area
  • the second scroll bar includes the scroll bar of the reference file display area or the scroll bar of the comparison file display area, but is different from the The first scroll bar is different. That is to say, when the first scroll bar is the scroll bar of the reference file display area, the second scroll bar is the scroll bar of the comparison file display area; when the first scroll bar is the scroll bar of the comparison file display area, the second scroll bar The bar is the scroll bar of the reference file display area.
  • the client when the user scrolls the scroll bar of the reference file display area, the client will calculate in real time the ratio of the current scrolled length of the reference file display area to the total length of the scroll area (for example, the current scrolled length is 2cm, scrolling The total length of the region is 10cm, and the ratio is 0.2), and according to this ratio, scroll the scroll bar of the comparison file display region to the ratio of 0.2 (such as the current scrolled length is 3cm, and the total length of the scrolling region is 12cm, it will scroll to 2.4cm).
  • the text after synchronous scrolling can be directly displayed.
  • the difference text currently scrolled to the display area can be highlighted in the comparison file and/or the reference file according to the difference comparison result, for Other text that is currently scrolled into the display area can be displayed normally without highlighting.
  • the file comparison method based on RPA and AI provided by the embodiment of the present invention can automatically upload the reference file and the comparison file to be compared by the RPA robot to the client, and the client can transmit the reference file and the comparison file to the server Perform a difference comparison, and finally highlight the difference text in the comparison file and/or reference file according to the difference comparison result returned by the server.
  • the embodiment of the present invention can use the RPA robot to automatically trigger the client to send two files to be compared to the server for automatic comparison, thereby not only saving manpower, but also It allows people who originally need to do file comparison to have time to do more valuable work, and can also improve the efficiency of file comparison; compared with the prior art that requires manual marking of differences, the embodiment of the present invention can directly compare reference files and /or the difference text is highlighted in the comparison file, so that the readability of the difference text can be improved, and thus the efficiency of the user in finding the difference between the two files can be improved.
  • the client when it sends the reference file and the comparison file to the server, it can first use OCR (Optical Character Recognition, Optical Character Recognition) to identify the reference file and the comparison file, and then the two files contain multiple pages of text
  • OCR Optical Character Recognition
  • the two files contain multiple pages of text
  • the documents are spliced to obtain a single-page reference text with continuous context and a single-page comparison text with continuous context.
  • the reference text and comparison text are sent to the server for difference comparison, so that the server can directly combine the context.
  • the two texts are compared without other processing by the server, thereby improving the efficiency and accuracy of the file comparison by the server.
  • another embodiment of the present invention also provides a file comparison device based on RPA and AI, the device is applied to the client, as shown in Figure 4, the device includes:
  • the receiving unit 20 is used to receive the reference file and comparison file uploaded by the robotic process automation RPA robot;
  • a sending unit 22 configured to send the reference file and the comparison file to a server
  • the receiving unit 20 is further configured to receive a difference comparison result of the comparison file relative to the reference file sent by the server;
  • the display unit 24 is configured to highlight the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the The text of differences between the comparison file and the reference file, the highlighted difference text in the reference file is the text of the difference between the reference file and the comparison file.
  • the difference comparison result includes at least one piece of difference information, each piece of difference information includes difference type, difference text in the reference file, difference text in the comparison file, difference text in the reference
  • the difference position information of the difference text in the file, and the difference position information of the difference text in the comparison file includes the page identification of the page to which the difference text belongs, and the coordinate information of the difference text on the page to which it belongs.
  • the display unit 24 includes:
  • a conversion module configured to convert the coordinate information into position information of divided DIV elements
  • a display module configured to, when the position information of the DIV element enters the display area of the file to which it belongs, according to the difference type corresponding to the position information of the DIV element and the paging mark corresponding to the position information of the DIV element, display the information indicated by the paging mark The difference text at the position information of the DIV element described in the pagination is highlighted.
  • the display unit 24 also includes:
  • a generation module for the same piece of difference information, according to the DIV element position information of the difference text in the reference file, the DIV element position information and the difference type of the difference text in the comparison file, generate an ID card identification number ID ;
  • a binding module used to respectively bind the ID with the DIV element position information of the difference text in the reference file, and the DIV element position information of the difference text in the comparison file;
  • the first synchronization module is configured to bind the position information of all DIV elements corresponding to the ID corresponding to the first synchronization positioning instruction when receiving the first synchronization positioning instruction triggered based on the reference file or the comparison file The difference text at is highlighted synchronously.
  • the display unit 24 is further configured to, after receiving the difference comparison result of the comparison file relative to the reference file sent by the server, according to the difference comparison result in a preset
  • the display area displays the difference details.
  • the preset display area is an area other than the reference file display area and the comparison file display area.
  • the difference details include the difference type in each piece of difference information, the difference in the reference file text, the text of the differences in the comparison file.
  • the binding module is also used to bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, respectively. After determining, bind the ID with the corresponding difference information in the difference details;
  • the display unit 24 also includes:
  • An acquiring module configured to acquire an ID bound to difference information in the difference details corresponding to the second synchronous positioning instruction when receiving a second synchronous positioning instruction triggered based on the difference details;
  • the second synchronization module is used for synchronously highlighting the difference texts at the position information of all DIV elements bound to the obtained ID.
  • the receiving unit 20 is further configured to, before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result, receive a message for the first scroll bar.
  • a scrolling instruction, the first scroll bar includes a scroll bar in the reference file display area or a scroll bar in the comparison file display area;
  • a determining unit configured to determine the ratio of the currently scrolled length of the first scroll bar to the total length of the scroll area according to the scroll instruction
  • a synchronous scrolling unit configured to scroll the second scroll bar according to the ratio, so that the first scroll bar and the second scroll bar scroll synchronously, and the second scroll bar includes a scroll bar in the reference file display area or a ratio A scrollbar for the document display area, but different from the first scrollbar.
  • the display unit is configured to highlight the difference text currently scrolled to the display area in the comparison file and/or the reference file according to the difference comparison result.
  • the sending unit 22 includes:
  • a recognition module configured to use optical character recognition (OCR) to identify the reference file and the comparison file, and obtain at least one page of text of the reference file and at least one page of text of the comparison file;
  • OCR optical character recognition
  • the splicing module is used to splice the multiple pages of text of the target file into one page of text with continuous context when the target file is a file containing multiple pages of text to obtain the target text.
  • the target file is a file containing a single page of text file
  • a sending module configured to send the reference text and the comparison text to the server.
  • another embodiment of the present invention also provides a computing device, the computing device includes:
  • processors one or more processors
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any embodiment of the present invention.
  • the processor is coupled with the storage device.
  • an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment of the present invention is implemented.
  • the embodiment of the present invention also provides a file comparison system based on RPA and AI, and the system includes an RPA robot 30 , a client 32 and a server 34 .
  • the RPA robot 30 may be independent from the client 32
  • the RPA robot 30 may be a part of the client 32 .
  • the RPA robot 30 is configured to log into the client 32, upload the reference file and the comparison file to the client 32, and trigger the client 32 to send the reference file and the comparison file to The server 34 performs a difference comparison;
  • the client 32 is configured to receive the reference file and the comparison file uploaded by the RPA robot, and send the reference file and the comparison file to the server;
  • the server 34 is configured to compare the difference between the reference file and the comparison file according to a preset comparison algorithm, and obtain a difference comparison result of the comparison file relative to the reference file;
  • the client 32 is also configured to receive the difference comparison result sent by the server, and according to the difference comparison result, highlight the difference text in the comparison file and/or the reference file, wherein , the highlighted difference text in the comparison file is the difference text between the comparison file and the reference file, and the highlighted difference text in the reference file is the difference text between the reference file and the Compare text with differences between files.
  • sequence numbers of the above-mentioned processes do not necessarily mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be used in the implementation of the present invention.
  • the implementation of the examples constitutes no limitation.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B based on A does not mean determining B only based on A, and B can also be determined based on A and/or other information.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above-mentioned integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-accessible memory.
  • the technical solution of the present invention or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute some or all of the steps of the above-mentioned methods in various embodiments of the present invention.
  • ROM read-only Memory
  • RAM random access memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the modules in the device in the embodiment may be distributed in the device in the embodiment according to the description in the embodiment, or may be changed and located in one or more devices different from the embodiment.
  • the modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention concerne un procédé et un appareil de comparaison de fichiers reposant sur RPA et AI, un dispositif et un support de stockage. Le procédé consiste à : recevoir un fichier de référence et un fichier de comparaison téléchargé par un robot RPA (S100) ; envoyer le fichier de référence et le fichier de comparaison à un serveur (S110) ; recevoir un résultat de comparaison de différence du fichier de comparaison envoyé par le serveur par rapport au fichier de référence (S120) ; et mettre en évidence un texte de différence dans le fichier de comparaison et/ou le fichier de référence selon le résultat de comparaison de différence (S130), le texte de différence mis en évidence dans le fichier de comparaison étant le texte du fichier de comparaison différent du fichier de référence, et le texte de différence mis en évidence dans le fichier de référence étant le texte du fichier de référence différent du fichier de comparaison. L'automatisation de la comparaison de fichiers peut être obtenue, et la différence entre deux fichiers peut également être mise en évidence.
PCT/CN2021/131627 2021-09-27 2021-11-19 Procédé et appareil de comparaison de fichiers reposant sur rpa et ai, dispositif et support de stockage WO2023045053A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111138084.1A CN113836092A (zh) 2021-09-27 2021-09-27 基于rpa和ai的文件比对方法、装置、设备及存储介质
CN202111138084.1 2021-09-27

Publications (1)

Publication Number Publication Date
WO2023045053A1 true WO2023045053A1 (fr) 2023-03-30

Family

ID=78970974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131627 WO2023045053A1 (fr) 2021-09-27 2021-11-19 Procédé et appareil de comparaison de fichiers reposant sur rpa et ai, dispositif et support de stockage

Country Status (2)

Country Link
CN (1) CN113836092A (fr)
WO (1) WO2023045053A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115450B (zh) * 2022-08-30 2022-11-29 平安银行股份有限公司 一种银联收单争议案件的建案方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543614A (zh) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 一种全文本差异比对方法及设备
CN111753517A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文档对比方法、装置、设备及介质
US20210109717A1 (en) * 2019-10-14 2021-04-15 UiPath Inc. Providing Image and Text Data for Automatic Target Selection in Robotic Process Automation
CN113407665A (zh) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 文本比对方法、装置、介质及电子设备
CN113836096A (zh) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 基于rpa和ai的文件比对方法、装置、设备、介质及系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470572B (zh) * 2007-12-29 2010-09-01 英业达股份有限公司 内文进度的最佳显示系统及其方法
IL235565B (en) * 2014-11-06 2019-06-30 Kolton Achiav Position-based optical character recognition
CN106528587B (zh) * 2016-09-12 2020-12-08 腾讯科技(深圳)有限公司 复合网页系统中页面的展示方法和装置
CN110162509A (zh) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 文件比对方法、装置、计算机设备及存储介质
CN111914597B (zh) * 2019-05-09 2024-03-15 杭州睿琪软件有限公司 一种文档对照识别方法、装置、电子设备和可读存储介质
CN111460763A (zh) * 2020-03-02 2020-07-28 南京南瑞继保电气有限公司 文件差异的标注方法、装置、设备及计算机可读存储介质
CN112084748A (zh) * 2020-09-19 2020-12-15 神思电子技术股份有限公司 一种文本比对方法
CN113031887A (zh) * 2021-04-08 2021-06-25 成都微视联软件技术有限公司 一种在html文件打印中支持多种页眉、分节打印的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543614A (zh) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 一种全文本差异比对方法及设备
US20210109717A1 (en) * 2019-10-14 2021-04-15 UiPath Inc. Providing Image and Text Data for Automatic Target Selection in Robotic Process Automation
CN111753517A (zh) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 基于rpa及ai的文档对比方法、装置、设备及介质
CN113407665A (zh) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 文本比对方法、装置、介质及电子设备
CN113836096A (zh) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 基于rpa和ai的文件比对方法、装置、设备、介质及系统

Also Published As

Publication number Publication date
CN113836092A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
US9514108B1 (en) Automatic reference note generator
US7966352B2 (en) Context harvesting from selected content
US8091022B2 (en) Online learning monitor
US9880989B1 (en) Document annotation service
US7930226B1 (en) User-driven document-based data collection
CA2684822C (fr) Procede de conversion de donnees base sur un document de conception technique
US7594165B2 (en) Embedded ad hoc browser web to spreadsheet conversion control
US20100324887A1 (en) System and method of online user-cycled web page vision instant machine translation
US20080282160A1 (en) Designated screen capturing and automatic image exporting
US20180300351A1 (en) System and Method for Display of Document Comparisons on a Remote Device
JP2018501551A (ja) 数式処理方法、装置、デバイス及びプログラム
US11423212B2 (en) Method and system for labeling and organizing data for summarizing and referencing content via a communication network
WO2023045053A1 (fr) Procédé et appareil de comparaison de fichiers reposant sur rpa et ai, dispositif et support de stockage
JP6840597B2 (ja) 検索結果要約装置、プログラム及び方法
AU2017394778A1 (en) Facilitated user interaction
CN111797297B (zh) 页面数据处理方法、装置、计算机设备及存储介质
US7945601B2 (en) Reporting of approval workflow transactions using XMLP
WO2023045056A1 (fr) Procédé, appareil et système de comparaison de documents basée sur l'arp et l'ia, et dispositif et support
US20080155501A1 (en) System and Method for Revising an Electronic Draft
US9959577B1 (en) Tax return preparation automatic document capture and parsing system
US7788283B2 (en) On demand data proxy
JP2007233698A (ja) ウェブ表示端末および注釈処理モジュール
US20150256493A1 (en) System and Method for Document Processing
US20190012400A1 (en) Information processing apparatus and non-transitory computer readable medium
KR101786019B1 (ko) 지능형 문장 자동 완성 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21958166

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE