WO2023029230A1 - Ai and rpa-based file annotation method and apparatus, device, and medium - Google Patents

Ai and rpa-based file annotation method and apparatus, device, and medium Download PDF

Info

Publication number
WO2023029230A1
WO2023029230A1 PCT/CN2021/132175 CN2021132175W WO2023029230A1 WO 2023029230 A1 WO2023029230 A1 WO 2023029230A1 CN 2021132175 W CN2021132175 W CN 2021132175W WO 2023029230 A1 WO2023029230 A1 WO 2023029230A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
text
marked
text information
sub
Prior art date
Application number
PCT/CN2021/132175
Other languages
French (fr)
Chinese (zh)
Inventor
杨子杰
汪冠春
胡一川
褚瑞
李玮
Original Assignee
北京来也网络科技有限公司
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京来也网络科技有限公司, 来也科技(北京)有限公司 filed Critical 北京来也网络科技有限公司
Publication of WO2023029230A1 publication Critical patent/WO2023029230A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions

Definitions

  • the present disclosure relates to the fields of artificial intelligence (AI for short) and robotic process automation (RPA for short), and in particular to a method, device, device and medium for document labeling based on AI and RPA.
  • AI artificial intelligence
  • RPA robotic process automation
  • RPA uses specific "robot software” to simulate human operations on computers and automatically execute process tasks according to rules.
  • AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
  • training data is obtained through a large number of manually annotated PDF files or pictures, and the document structure information and visual information are modeled.
  • the general document pre-training model LayoutLM allows the model to perform multi-modal alignment in the pre-training stage.
  • the above-mentioned document labeling method cannot select discontinuous text and extract text on the picture, and does not include the position information of the text in the document, which cannot meet the needs of model training.
  • the present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
  • this disclosure proposes a file tagging method, device, device, and medium based on AI and RPA, in order to realize that the RPA system realizes image tagging by determining the text tagging area range in the target image and the text tagging results within the area range.
  • the extraction of Chinese text information and the selection of discontinuous characters in the text can also obtain the text information within the marked area and the position information of the text fragments in the text information, which can meet the needs of model training.
  • the embodiment of the first aspect of the present disclosure proposes a file tagging method based on AI and RPA, including: the RPA system obtains a file tagging request; wherein, the file tagging request is used to tag the file to be tagged; the RPA system responds to The file labeling request generates a response result corresponding to the file labeling request; the RPA system draws a target picture corresponding to the file to be marked according to the response result; the RPA system determines the The area range of the text annotation in the target picture; the RPA system is based on the first text information obtained by performing optical character recognition OCR on the document to be annotated and the position information corresponding to each text segment of the first text information, A text annotation result within the range of the region is determined.
  • the embodiment of the second aspect of the present disclosure proposes a file tagging device based on AI and RPA.
  • the file tagging device is applied to the RPA system, and includes: an acquisition module for obtaining a file tagging request; wherein, the file tagging request uses To mark the file to be marked; the generation module is used to generate a response result corresponding to the file mark request in response to the file mark request; the drawing module is used to draw the corresponding file to be marked according to the response result
  • the first determination module is used to determine the area range of the text label in the target picture in response to the mouse event;
  • the second determination module is used to obtain the optical character recognition OCR according to the document to be marked
  • the first text information of the first text information and the position information corresponding to each text segment of the first text information determine the text annotation result within the range of the area.
  • the embodiment of the third aspect of the present disclosure proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, it realizes the The method described in the embodiment of the first aspect above.
  • the embodiment of the fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method as described in the above-mentioned embodiment of the first aspect of the present disclosure is implemented.
  • the embodiment of the fifth aspect of the present disclosure provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the method as described in the above-mentioned embodiment of the first aspect of the present disclosure is implemented.
  • the file labeling request is used to label the file to be marked; the RPA system generates a response result corresponding to the file labeling request in response to the file labeling request; the RPA system draws a file according to the response result The target picture corresponding to the document to be marked; the RPA system determines the area range of the text label in the target picture in response to the mouse event; the RPA system performs optical character recognition OCR according to the first text obtained by the document to be marked information and the position information corresponding to each text segment of the first text information to determine the text annotation result within the range of the area.
  • the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area
  • the text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
  • FIG. 1 is a schematic flow diagram of an AI and RPA-based file labeling method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the position information of the target sub-picture corresponding to the sub-file to be marked corresponding to the area range provided by the embodiment of the present disclosure
  • FIG. 4 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an AI and RPA-based file tagging device provided by an embodiment of the present disclosure.
  • FIG. 8 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
  • FIG. 1 is a schematic flowchart of an AI and RPA-based document labeling method provided by an embodiment of the present disclosure.
  • the file tagging method based on AI and RPA provided in the embodiment of the present disclosure can be applied to the device for tagging files based on AI and RPA in the embodiment of the present disclosure, and the device can be configured in an electronic device.
  • the electronic device may be a personal computer, a mobile terminal, etc.
  • the mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.
  • the AI and RPA-based document annotation method may include the following steps:
  • step 101 the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
  • the user can send a file labeling request to the RPA system through the interactive interface, so that the RPA system can label the file to be marked according to the file labeling request.
  • the text annotation request can be used to annotate the file to be annotated.
  • step 102 the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
  • the RPA system generates a response result corresponding to the labeling request according to the obtained file labeling request, wherein the response result may include: the file to be marked corresponding to the file labeling request, the conversion picture corresponding to the file to be marked; and based on Optical Character Recognition (OCR for short) acquires the first text information corresponding to the file to be marked and the position information corresponding to each text segment in the first text information.
  • OCR Optical Character Recognition
  • the location information corresponding to the text segment may include location information corresponding to each word and text in the text.
  • the position information corresponding to each word and text may be the position of each word and text relative to the page, for example, the coordinate information of the word or text relative to the four vertices of the page.
  • Step 103 the RPA system draws the target picture corresponding to the file to be marked according to the response result.
  • the to-be-labeled file in the response result may include one or more to-be-labeled sub-files
  • the target picture corresponding to the to-be-labeled file may be drawn in different ways according to the number of to-be-labeled sub-files.
  • the RPA system can draw the sub-files to be marked according to the text information corresponding to the multiple sub-files to be marked and the position information corresponding to each text segment in the text information The corresponding target sub-picture is spliced to obtain the target picture corresponding to the file to be marked.
  • the RPA system can draw the first text information corresponding to the file to be marked and the position information corresponding to each text segment in the first text information, and draw the The target image corresponding to the file.
  • step 104 the RPA system determines the area range of the text annotation in the target image in response to the mouse event.
  • the RPA system can determine the area range of the text annotation in the target picture according to the mouse event. For example, when the mouse event sequentially includes: a mouse click event, a mouse move event, and a mouse lift event, the area range of the text marked by the mouse event may be determined.
  • step 105 the RPA system determines the text labeling result within the area according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
  • OCR optical character recognition
  • the to-be-labeled subfile to which the area belongs can be determined according to the coordinate information of the area range, and then the second text information corresponding to the to-be-labeled subfile in the first text information and the first text information in the first text information are obtained.
  • Each text segment of the corresponding second text information in each text segment of each text segment and then, according to the position information of the range range relative to the sub-file to be marked, determine the area range from the second text information and each text segment of the second text information The text labeling results of .
  • the document labeling method of the embodiment of the present disclosure can convert unstructured long text into structured data, and assist users to complete the intelligent extraction of key information of documents.
  • the file annotation request is obtained through the RPA system; the file annotation request is used to annotate the file to be annotated; the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request; the RPA system draws the response result according to the response result Annotate the target picture corresponding to the file; the RPA system determines the area range of the text label in the target picture in response to the mouse event; the RPA system performs optical character recognition OCR according to the document to be marked. The location information corresponding to the text segment determines the text labeling result within the area.
  • the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area
  • the text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
  • Figure 2 is the flow of another AI and RPA-based file marking method provided by the embodiment of the present disclosure
  • Schematic diagram in the embodiment of the present disclosure, it is possible to determine the sub-file to be marked to which the area range belongs, according to the location information of the area range relative to the sub-file to be marked to which the area range belongs, and the position information of the sub-file to be marked in the file to be marked, Therefore, in the first text information and the position information corresponding to each text segment of the first text information, the text labeling result within the area range is determined.
  • the embodiment shown in Figure 2 may include the following steps:
  • step 201 the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
  • step 202 the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
  • Step 203 the RPA system draws the target picture corresponding to the file to be marked according to the response result.
  • step 204 the RPA system determines the area range of the text annotation in the target image in response to the mouse event.
  • Step 205 the RPA system determines the sub-file to be marked to which the area range belongs according to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked.
  • the RPA system can pre-set the height information of each sub-file to be marked, and then, the RPA system can determine the height of the vertex of the area range relative to the sub-file to be marked according to the vertex coordinate information (such as the upper left vertex) of the area range. According to the height information of the origin of the target sub-picture corresponding to the file and the height information of the target sub-picture corresponding to each sub-file to be marked, the sub-file to be marked to which the area belongs can be determined.
  • the height of the vertices of the region range relative to the origin of the target picture corresponding to the sub-file to be marked is greater than the height of the target sub-picture corresponding to one sub-file to be marked, and smaller than the height of the target sub-picture corresponding to two sub-files to be marked, It can be determined that the area range belongs to the second sub-file to be marked.
  • step 206 the RPA system determines the location information of the region range relative to the target sub-picture corresponding to the to-be-marked sub-file to which the region range belongs.
  • the elements on the page from outside to inside are: window object window.document, drawing object canvas for drawing pdf files, canvas relative to document The positions are left and top, and there is no gap between page and canvas.
  • width and height are the width and height of the area range respectively, and the width and height of the area range can be obtained by calculating the end coordinate and the start coordinate of the label.
  • the start coordinates of the area range label are (x, y)
  • the end coordinates of the area range label are (x1, y1)
  • the width of the area range can be
  • the height of the area range can be
  • step 207 the RPA system determines the text labeling results within the area in the first text information and the position information corresponding to each text segment of the first text information according to the position information.
  • the RPA system determines the second text information corresponding to the sub-file to be marked to which the area range belongs in the first text information according to the position information of the sub-file to be marked to which the area range belongs; the RPA system According to the corresponding relationship between the second text information and the first text information, determine the position information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information; The location information of the target sub-picture corresponding to the sub-file to be marked to determine the third text information in the second text information within the area; the RPA system according to the correspondence between the third text information and the second text information, in the second text information The position information corresponding to each text segment of the third text information is determined in the position information corresponding to each text segment of the information; the RPA system uses the third text information and the position information corresponding to each text segment of the third text information as the area range Text annotation results.
  • the RPA system determines the sub-file to be marked to which the area belongs, it can determine the second sub-file corresponding to the sub-file to be marked in the first text information according to the position information of the sub-file to be marked in the file to be marked.
  • the RPA system determines that the sub-file to be marked to which the region belongs is the second page in the file to be marked, and the second text information corresponding to the sub-file to be marked on the second page can be determined from the first text information.
  • the RPA system can determine the position information corresponding to each text segment of the second text information from the position information corresponding to each text segment of the first text information according to the correspondence relationship between the second text information and the first text information.
  • the RPA system determines the third text information corresponding to the area range in the second text information according to the position information of the area range relative to the target sub-picture corresponding to the sub-file to be marked to which the area range belongs, and the RPA system determines the third text information corresponding to the area range according to the third text information
  • the corresponding relationship with the second text information is to determine the position information corresponding to each text segment of the third text information in the position information corresponding to each text segment of the second text information; the RPA system combines the third text information and the third text information
  • the location information corresponding to each text segment is used as the text labeling result within the region.
  • the RPA system can label and save the position information of the region range relative to the target sub-picture of the sub-file to be marked to which the region range belongs, and the text labeling results of the region range, as the training data of the model , to meet the needs of model training. For example, it can be used as training data for the general document pre-training model.
  • the general document pre-training model can combine document structure information and visual information for multi-modal alignment.
  • This model can be applied to tasks such as form understanding, bill understanding, and document image classification.
  • steps 201-204 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
  • the RPA system determines the sub-file to be marked to which the area belongs according to the vertex coordinate information of the area and the height information of the sub-file to be marked in the file to be marked; the RPA system determines the area to be marked relative to the area to be marked The position information of the target sub-picture corresponding to the sub-file; the RPA system determines the text labeling result within the area range in the first text information and the position information corresponding to each text segment of the first text information according to the position information. In this way, the text labeling result within the area can be accurately determined, so that the text information within the marked area and the position information of the text segment in the text information can be obtained.
  • Figure 4 is another method based on Schematic flowchart of the AI and RPA file labeling method.
  • the range of text labeling in the target image can be determined through mouse click events, mouse move events, and mouse lift events.
  • the embodiment shown in Figure 4 may include the following steps:
  • step 401 the RPA system acquires a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
  • step 402 the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
  • Step 403 the RPA system draws the target picture corresponding to the file to be marked according to the response result.
  • the RPA system monitors the mouse event of the target image; wherein, the mouse event sequentially includes: a mouse click event, a mouse move event, and a mouse lift event.
  • the RPA system can monitor the mouse event of the target icon through the monitoring function.
  • the mouse event includes a mouse click event (mousedown event), a mouse movement event (mousemove event) and a mouse up event (mouseup event) in sequence , which can determine the selection of the area range for text annotation in the target image.
  • Step 405 the RPA system determines the first coordinate of the area range according to the mouse click event.
  • the RPA system can use the coordinates of the mouse click event as the starting coordinate of the area range, that is, the first coordinate, by monitoring the mouse click event.
  • Step 406 the RPA system determines the second coordinates of the area range according to the mouse moving event and the mouse lifting event.
  • the RPA system can determine the end coordinate of the area range, that is, the second coordinate, by monitoring the mouse movement event and the mouse lift event.
  • Step 407 the RPA system determines the height value and width value of the area range according to the first coordinate and the second coordinate.
  • the abscissa in the first coordinate may be subtracted from the abscissa in the second coordinate, and the absolute value of the subtraction result may be used as the width value of the area range.
  • the ordinate in the first coordinate is subtracted from the ordinate in the second coordinate, and the absolute value of the subtraction result is used as the height value of the area range.
  • the RPA system uses the first coordinate, the second coordinate, and the enclosed area of the height value and width value of the area range as the area range marked by the text in the target image.
  • the RPA system adds the abscissa of the first coordinate to the width value of the area range to obtain the third coordinate, and the RPA system adds the ordinate of the first coordinate to the height value of the area range to obtain
  • the fourth coordinate is obtained, and the enclosed area enclosed by the first coordinate, the second coordinate, the third coordinate and the fourth coordinate is used as the range of the text label in the target image. It should be noted that, in order to more accurately determine the text labeling results within the area, the number of text labeling areas in the target image at the same time is one, and the area range can be realized through the tag ⁇ div>.
  • the RPA system determines the text labeling result within the region according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
  • OCR optical character recognition
  • steps 401-403, 409 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
  • the mouse event of the target image is monitored through the RPA system; among them, the mouse event includes: mouse click event, mouse move event and mouse lift event; the RPA system determines the first coordinate of the area according to the mouse click event; the RPA system According to the mouse movement event and the mouse lift event, determine the second coordinate of the area range; the RPA system uses the first coordinate, the second coordinate, and the enclosed area of the height value and width value of the area range as the text annotation in the target image geographic range. Therefore, in response to the mouse event, the RPA system can accurately determine the range of the text label in the target picture, and realize the extraction of text information in the picture and the selection of discontinuous characters in the text.
  • FIG. 5 is a schematic flowchart of another AI and RPA-based file tagging method provided by the embodiment of the present disclosure.
  • the file to be marked in the case that the file to be marked is not a picture, the file to be marked can be converted into a converted picture first, and then character recognition is performed on the converted picture according to optical character recognition to obtain the first text information corresponding to the document to be marked and the first text information.
  • the embodiment shown in Figure 5 may include the following steps:
  • step 501 the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
  • Step 502 the RPA system obtains the file to be marked corresponding to the file marking request according to the file marking request.
  • the RPA system may acquire the file to be marked corresponding to the file marking request according to the identification of the file to be marked in the file marking request.
  • the file marking request may include the identification of the file to be marked.
  • step 503 the RPA system performs image conversion on the file to be marked, and obtains the converted picture corresponding to the file to be marked.
  • the file to be marked may be converted into a picture.
  • the document to be marked may be converted into a picture through a document picture conversion technology, and the converted picture may be used as a converted picture.
  • the pdf file can be converted to an image through the pdf.js plug-in.
  • step 504 the RPA system performs character recognition on the converted image based on optical character recognition, so as to obtain the first text information corresponding to the document to be marked and the position information corresponding to each text segment of the first text information.
  • the RPA system performs character recognition on the converted picture based on optical character recognition, uses the recognized text information as the first text information corresponding to the document to be marked, and uses the position information (for example, the x-axis and y-axis coordinates of each word or word in the page on the top, bottom, left, and right vertices) are used as position information corresponding to each text segment of the first text information.
  • the converted picture may be enlarged by a preset multiple, and the converted picture enlarged by the preset multiple may be sent to the optical character recognition interface for performing character recognition on the converted picture.
  • Step 505 the RPA system takes the file to be marked, the first text information corresponding to the file to be marked, and the location information corresponding to each text segment of the first text information as a response result corresponding to the file marking request.
  • the RPA system may use the position information corresponding to the file to be marked corresponding to the file marking request, the first text information corresponding to the file to be marked, and each text segment of the first text information as the corresponding position information of the file marking request Response results.
  • Step 506 the RPA system draws the target picture corresponding to the document to be marked according to the response result.
  • step 507 the RPA system determines the area range of the text annotation in the target picture in response to the mouse event.
  • step 508 the RPA system determines the text labeling result within the region according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
  • OCR optical character recognition
  • steps 501, 506-508 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
  • the RPA system obtains the file to be marked corresponding to the file marking request through the RPA system; the RPA system converts the image of the file to be marked to obtain the converted image corresponding to the file to be marked; the RPA system converts the file based on optical character recognition Character recognition is performed on the picture to obtain the first text information corresponding to the file to be marked and the position information corresponding to each text segment of the first text information; The location information corresponding to each text segment of the information is used as a response result corresponding to the file annotation request. Therefore, the RPA system can accurately obtain the response result corresponding to the request to be marked according to the file marking request.
  • Figure 6 is a schematic flowchart of another AI and RPA-based file marking method provided by the embodiment of the present disclosure.
  • the target sub-picture corresponding to the sub-file to be marked can be determined, and the target picture corresponding to the file to be marked can be determined according to a plurality of target sub-pictures.
  • the embodiment shown in Figure 6 may include the following steps:
  • step 601 the RPA system acquires a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
  • step 602 the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
  • Step 603 the RPA system acquires multiple sub-files to be marked of the file to be marked in the response result.
  • the file to be marked may include multiple sub-files to be marked or one sub-file to be marked.
  • the file to be marked is a pdf file, and the number of pages of the pdf file can be one or more pages. When the number of pages of the pdf file is one page, the file to be marked only includes one sub-file to be marked. When there are multiple pages, the RPA system can use each page of the pdf file as a sub-file to be marked.
  • a multi-page pdf file can include multiple sub-files to be annotated. For each sub-file to be marked in the multi-page pdf file, the RPA system can identify each sub-file to be marked, and the mark can be used to identify the position of each sub-file to be marked in the file to be marked.
  • Step 604 for each sub-file to be marked, the RPA system creates a drawing object corresponding to the sub-file to be marked.
  • a drawing object corresponding to the subfile to be marked can be created, and a page object can be created according to the attribute information of the subfile to be marked, for example, the drawing object is a canvas object, and the page object It is a page object, and the page object includes the height information and width information of the subfile to be marked.
  • the RPA system sets a default width value and height value for it.
  • the RPA system can adjust the size information of the drawing object according to the attribute information of the sub-file to be marked in the page object.
  • Step 605 the RPA system determines the text information corresponding to the sub-file to be marked in the first text information according to the position information of the sub-file to be marked in the file to be marked.
  • the RPA system can identify each sub-file to be marked, which can be used to identify the position of each sub-file to be marked in the file to be marked, and the RPA system can according to the Mark the position information of the sub-file in the file to be marked, and determine the text information corresponding to the sub-file to be marked in the first text information.
  • the position of the to-be-labeled sub-file in the to-be-labeled file is the second page, and the text information corresponding to the second page of the to-be-labeled sub-file can be obtained from the first text information.
  • Step 606 according to the corresponding relationship between the text information and the first text information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment of the first text information.
  • the position information corresponding to each text segment of the first text information can be , determine the position information corresponding to each text segment of the text information.
  • Step 607 the RPA system draws the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and the position information corresponding to each text fragment of the text information.
  • the RPA system draws the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, combined with the text information corresponding to the sub-file to be marked and the position information corresponding to each text segment of the text information.
  • the size of the target sub-picture is the same as that of the sub-file to be marked, and the target sub-picture may include text information of the sub-file to be marked and position information corresponding to each text segment of the text information.
  • step 608 the RPA system stitches together the target sub-pictures corresponding to the multiple sub-files to be marked to obtain the target picture.
  • the RPA system splices the target sub-pictures corresponding to the multiple sub-files to be marked, and takes the splicing result of the multiple target sub-pictures as the target picture.
  • step 609 the RPA system determines the area range of the text annotation in the target picture in response to the mouse event.
  • the RPA system determines the text labeling result within the area according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
  • OCR optical character recognition
  • steps 601-602 and 609-610 may be implemented in any one of the embodiments of the present disclosure respectively, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
  • multiple sub-files to be marked in the response result are obtained through the RPA system; for each sub-file to be marked, the RPA system creates a drawing object corresponding to the sub-file to be marked; according to the text information and the first text The corresponding relationship of information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment of the first text information;
  • the RPA system is based on the size information of the drawing object, and the text information corresponding to the sub-file to be marked and The position information corresponding to each text segment of the text information, draws the target sub-picture corresponding to the sub-file to be marked; the RPA system stitches the target sub-pictures corresponding to multiple sub-files to be marked to obtain the target picture.
  • the RPA system can accurately draw the target picture corresponding to the file to be marked according to the plurality of sub-files to be marked, the text information corresponding to the sub-files to be marked, and the position information corresponding to each text segment of the text information.
  • the file tagging request is obtained through the RPA system; wherein, the file tagging request is used to tag the file to be tagged; the RPA system generates a file tagging request in response to the file tagging request Corresponding response result; RPA system draws the target picture corresponding to the document to be marked according to the response result; RPA system determines the area range of the text label in the target picture in response to the mouse event; RPA system according to the described The first text information obtained by performing optical character recognition (OCR) on the file to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the area.
  • OCR optical character recognition
  • the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area
  • the text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
  • the tagging device corresponds to the AI-based and RPA-based file tagging method provided in the embodiments of FIGS.
  • the file tagging device for is not described in detail in the embodiments of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an AI and RPA-based file tagging device provided by an embodiment of the present disclosure.
  • the AI and RPA-based document tagging device 700 may include: an acquisition module 710 , a generation module 720 , a drawing module 730 , a first determination module 740 and a second determination module 750 .
  • the acquisition module 710 is used to obtain the file annotation request; wherein the file annotation request is used to annotate the file to be annotated; the generation module 720 is used to generate a response result corresponding to the file annotation request in response to the file annotation request; the drawing module 730, for drawing the target picture corresponding to the file to be marked according to the response result; the first determining module 740, for responding to the mouse event, determining the range of the text label in the target picture; the second determining module 750, for according to The first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the area.
  • OCR optical character recognition
  • the second determination module 750 is configured to: determine the sub-file to be marked to which the area range belongs according to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked file; determine the position information of the target sub-picture corresponding to the sub-file to be marked to which the area range belongs; according to the position information, in the first text information and the position information corresponding to each text segment of the first text information, determine the area Text labeling results within the range.
  • the second determination module 750 is further configured to: determine the location of the subfile to be marked to which the region range belongs according to the position information of the subfile to be marked to which the region range belongs.
  • the corresponding second text information in the first text information according to the corresponding relationship between the second text information and the first text information, determine the location information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information Position information; according to the position information of the area range relative to the sub-file to be marked to which the area range belongs, determine the third text information in the second text information within the area range; according to the correspondence between the third text information and the second text information , determining the position information corresponding to each text segment of the third text information in the position information corresponding to each text segment of the second text information; using the third text information and the position information corresponding to each text segment of the third text information as an area Text labeling results within the range.
  • the first determination module 740 is configured to: monitor the mouse event of the target picture; wherein, the mouse event includes in sequence: a mouse click event, a mouse move event, and a mouse lift event; Click the event to determine the first coordinate of the area; according to the mouse movement event and the mouse lift event, determine the second coordinate of the area; according to the first coordinate and the second coordinate, determine the height and width of the area; set the second The area enclosed by the first coordinate, the second coordinate, and the height and width values of the area range is used as the area range marked by the text in the target image.
  • the generating module 720 is configured to: acquire the file to be marked corresponding to the file marking request according to the file marking request; perform image conversion on the file to be marked to obtain the corresponding file to be marked Converting the picture; performing character recognition on the converted picture based on optical character recognition OCR, to obtain the first text information corresponding to the document to be marked and the position information corresponding to each text segment of the first text information; the document to be marked , the first text information corresponding to the file to be marked and the location information corresponding to each text segment of the first text information, as a response result corresponding to the file marking request.
  • the drawing module 730 is configured to: obtain multiple sub-files to be marked in the response result; for each sub-file to be marked in the file to be marked, create and The drawing object corresponding to the subfile to be marked; according to the position information of the subfile to be marked in the file to be marked, determine the text information corresponding to the subfile to be marked in the first text information; according to the text information and the first text information Correspondence, in the position information corresponding to each text segment in the first text information, determine the position information corresponding to each text segment of the text information; according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and each of the text information The position information corresponding to the text segment is used to draw the target sub-picture corresponding to the sub-file to be marked; the target sub-picture corresponding to the multiple sub-files to be marked is spliced to obtain the target picture.
  • the AI and RPA-based document tagging apparatus 700 further includes: a processing module.
  • the processing module is used for labeling the position information of the sub-file to be marked with respect to the region range, and the third text information within the region range and the position information corresponding to each text segment of the third text information Annotate and save as training data for the model.
  • the AI and RPA-based file tagging device of the disclosed embodiment obtains the file tagging request through the RPA system; wherein, the file tagging request is used to tag the file to be tagged; the RPA system generates a file tagging request in response to the file tagging request Corresponding response result; RPA system draws the target picture corresponding to the document to be marked according to the response result; RPA system determines the area range of the text label in the target picture in response to the mouse event; RPA system according to the described The first text information obtained by performing optical character recognition (OCR) on the file to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the range of the area.
  • OCR optical character recognition
  • the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area
  • the text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
  • an embodiment of the present disclosure also proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the The AI and RPA-based document labeling method as described in any of the foregoing method embodiments.
  • the embodiments of the present disclosure also propose a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method based on AI and RPA document annotation methods.
  • the embodiment of the present disclosure also proposes a computer program product, when the instruction processor in the computer program product is executed, the file annotation based on AI and RPA as described in any of the foregoing method embodiments is realized method.
  • FIG. 8 is a block diagram of an electronic device according to an AI and RPA-based file tagging method provided by an embodiment of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device includes: one or more processors 801, a memory 802, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces.
  • the various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses may be used with multiple memories and multiple memories, as desired.
  • multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system).
  • a processor 801 is taken as an example.
  • the memory 802 is a non-transitory computer-readable storage medium provided in the present disclosure.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the AI and RPA-based file tagging method provided by the present disclosure.
  • the non-transitory computer-readable storage medium of the present disclosure stores computer instructions, and the computer instructions are used to enable a computer to execute the AI and RPA-based file tagging method provided in the present disclosure.
  • the memory 802 as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/ modules (for example, the acquisition module 710, the generation module 720, the drawing module 730, the first determination module 740 and the second determination module 750 shown in FIG. 7).
  • the processor 801 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 802, that is, implements the AI and RPA-based file marking method in the above method embodiments.
  • the memory 802 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; data etc.
  • the memory 802 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the storage 802 may optionally include storages that are set remotely relative to the processor 801, and these remote storages may be connected to the electronic device for document annotation based on AI and RPA through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device of the AI and RPA-based document tagging method may further include: an input device 803 and an output device 804 .
  • the processor 801, the memory 802, the input device 803, and the output device 804 may be connected through a bus or in other ways. In FIG. 8, connection through a bus is taken as an example.
  • the input device 803 can receive input numbers or character information, and generate key signal inputs related to the user settings and function control of the generated electronic equipment based on AI and RPA file annotation, such as touch screen, small keyboard, mouse, trackpad, Input devices such as a touchpad, pointing stick, one or more mouse buttons, trackball, joystick, etc.
  • the output device 804 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution proposed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the fields of AI and RPA, and provides an AI and RPA-based file annotation method and apparatus. The method comprises: an RPA system obtains a file annotation request; the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request; the RPA system draws, according to the response result, a target picture corresponding to a file to be annotated; the RPA system determines a regional range of text annotation in the target picture in response to a mouse event; and the RPA system determines a text annotation result in the regional range according to first text information obtained by performing optical character recognition (OCR) on the file to be annotated and position information corresponding to text fragments of the first text information.

Description

基于AI和RPA的文件标注方法、装置、设备和介质File annotation method, device, equipment and medium based on AI and RPA
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111021971.0、申请日为2021年9月1日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111021971.0 and a filing date of September 1, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及人工智能(Artificial Intelligence,简称AI)和机器人流程自动化(Robotic Process Automation,简称RPA)领域,尤其涉及一种基于AI和RPA的文件标注方法、装置、设备和介质。The present disclosure relates to the fields of artificial intelligence (AI for short) and robotic process automation (RPA for short), and in particular to a method, device, device and medium for document labeling based on AI and RPA.
背景技术Background technique
RPA是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。RPA uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules.
AI是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
随着RPA的普及,越来越多的企业使用RPA帮助员工完成重复的劳动,但是在模型的训练过程中,依然需要大量的人工对文件进行标注,以获取训练数据。比如,通过大量的人工标注PDF文件或图片获取训练数据,对文档结构信息和视觉信息进行建模,如,通用文档预训练模型LayoutLM,让模型在预训练阶段进行多模态对齐。With the popularity of RPA, more and more companies use RPA to help employees complete repetitive tasks. However, in the process of model training, a large number of manual annotations on files are still required to obtain training data. For example, training data is obtained through a large number of manually annotated PDF files or pictures, and the document structure information and visual information are modeled. For example, the general document pre-training model LayoutLM allows the model to perform multi-modal alignment in the pre-training stage.
然而上述的文件标注方式,无法选择不连续的文字和提取图片上的文字,不包含文字在文档中的位置信息,无法满足模型训练的需求。However, the above-mentioned document labeling method cannot select discontinuous text and extract text on the picture, and does not include the position information of the text in the document, which cannot meet the needs of model training.
发明内容Contents of the invention
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。The present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
为此,本公开提出一种基于AI和RPA的文件标注方法、装置、设备和介质,以实现RPA系统通过确定目标图片中的文本标注区域范围,以及区域范围内的文本标注结果,实现了图片中文本信息的提取以及文本中不连续文字的选择,同时可获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,可满足模型训练的需求。For this reason, this disclosure proposes a file tagging method, device, device, and medium based on AI and RPA, in order to realize that the RPA system realizes image tagging by determining the text tagging area range in the target image and the text tagging results within the area range. The extraction of Chinese text information and the selection of discontinuous characters in the text can also obtain the text information within the marked area and the position information of the text fragments in the text information, which can meet the needs of model training.
本公开第一方面实施例提出了一种基于AI和RPA的文件标注方法,包括:RPA系统获取文件标注请求;其中,所述文件标注请求用于对待标注文件进行标注;所述RPA系统响应于所述文件标注请求,生成与所述文件标注请求对应的响应结果;所述RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片;所述RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;所述RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。The embodiment of the first aspect of the present disclosure proposes a file tagging method based on AI and RPA, including: the RPA system obtains a file tagging request; wherein, the file tagging request is used to tag the file to be tagged; the RPA system responds to The file labeling request generates a response result corresponding to the file labeling request; the RPA system draws a target picture corresponding to the file to be marked according to the response result; the RPA system determines the The area range of the text annotation in the target picture; the RPA system is based on the first text information obtained by performing optical character recognition OCR on the document to be annotated and the position information corresponding to each text segment of the first text information, A text annotation result within the range of the region is determined.
本公开第二方面实施例提出了一种基于AI和RPA的文件标注装置,所述文件标注装置应用与RPA系统,包括:获取模块,用于获取文件标注请求;其中,所述文件标注请求用于对待标注文件进行标注;生成模块,用于响应于所述文件标注请求,生成与所述文件标注请求对应的响应结果;绘制模块,用于根据所述响应结果,绘制所述待标注文件对应的目标图片;第一确定模块,用于响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;第二确定模块,用于根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。The embodiment of the second aspect of the present disclosure proposes a file tagging device based on AI and RPA. The file tagging device is applied to the RPA system, and includes: an acquisition module for obtaining a file tagging request; wherein, the file tagging request uses To mark the file to be marked; the generation module is used to generate a response result corresponding to the file mark request in response to the file mark request; the drawing module is used to draw the corresponding file to be marked according to the response result The target picture; the first determination module is used to determine the area range of the text label in the target picture in response to the mouse event; the second determination module is used to obtain the optical character recognition OCR according to the document to be marked The first text information of the first text information and the position information corresponding to each text segment of the first text information, determine the text annotation result within the range of the area.
本公开第三方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如本公开上述第一方面实施例所述的方 法。The embodiment of the third aspect of the present disclosure proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, it realizes the The method described in the embodiment of the first aspect above.
本公开第四方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如本公开上述第一方面实施例所述的方法。The embodiment of the fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method as described in the above-mentioned embodiment of the first aspect of the present disclosure is implemented.
本公开第五方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如本公开上述第一方面实施例所述的方法。The embodiment of the fifth aspect of the present disclosure provides a computer program product, including a computer program. When the computer program is executed by a processor, the method as described in the above-mentioned embodiment of the first aspect of the present disclosure is implemented.
本公开实施例所提供的技术方案包含如下的有益效果:The technical solutions provided by the embodiments of the present disclosure include the following beneficial effects:
通过RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注;RPA系统响应于所述文件标注请求,生成与文件标注请求对应的响应结果;RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片;RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。由此,RPA系统通过确定目标图片中的文本标注区域范围,以及区域范围内的文本标注结果,实现了图片中文本信息的提取以及文本中不连续文字的选择,同时可获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,可满足模型训练的需求。Obtain the file labeling request through the RPA system; wherein, the file labeling request is used to label the file to be marked; the RPA system generates a response result corresponding to the file labeling request in response to the file labeling request; the RPA system draws a file according to the response result The target picture corresponding to the document to be marked; the RPA system determines the area range of the text label in the target picture in response to the mouse event; the RPA system performs optical character recognition OCR according to the first text obtained by the document to be marked information and the position information corresponding to each text segment of the first text information to determine the text annotation result within the range of the area. Therefore, the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area The text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
附图说明Description of drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and understandable from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1为本公开实施例所提供的一种基于AI和RPA的文件标注方法的流程示意图;FIG. 1 is a schematic flow diagram of an AI and RPA-based file labeling method provided by an embodiment of the present disclosure;
图2为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图;FIG. 2 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure;
图3为本公开实施例所提供的区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息示意图;FIG. 3 is a schematic diagram of the position information of the target sub-picture corresponding to the sub-file to be marked corresponding to the area range provided by the embodiment of the present disclosure;
图4为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图;FIG. 4 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure;
图5为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图;FIG. 5 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure;
图6为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图;FIG. 6 is a schematic flow diagram of another AI and RPA-based file labeling method provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种基于AI和RPA的文件标注装置的结构示意图;FIG. 7 is a schematic structural diagram of an AI and RPA-based file tagging device provided by an embodiment of the present disclosure;
图8示出了适于用来实现本公开实施方式的示例性电子设备的框图。FIG. 8 shows a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the drawings, in which the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present disclosure and should not be construed as limiting the present disclosure.
下面参考附图描述本公开实施例的基于AI和RPA的文件标注方法、装置、设备和介质。The following describes the AI and RPA-based file tagging method, device, device, and medium of the embodiments of the present disclosure with reference to the accompanying drawings.
图1为本公开实施例所提供的一种基于AI和RPA的文件标注方法的流程示意图。FIG. 1 is a schematic flowchart of an AI and RPA-based document labeling method provided by an embodiment of the present disclosure.
本公开实施例提供的基于AI和RPA的文件标注方法,可应用于本公开实施例的基于AI和RPA的文件标注装置,该装置可被配置于电子设备中。其中,该电子设备可以是个人电脑、移动终端等,移动终端例如为手机、平板电脑、个人数字助理等具有各种操作系统的硬件设备。The file tagging method based on AI and RPA provided in the embodiment of the present disclosure can be applied to the device for tagging files based on AI and RPA in the embodiment of the present disclosure, and the device can be configured in an electronic device. Wherein, the electronic device may be a personal computer, a mobile terminal, etc., and the mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.
如图1所示,该基于AI和RPA的文件标注方法可以包括以下步骤:As shown in Figure 1, the AI and RPA-based document annotation method may include the following steps:
步骤101,RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注。In step 101, the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
在本公开实施例中,用户可通过交互界面向RPA系统发送文件标注请求,以使RPA系统根据文件 标注请求对待标注文件进行标注。其中,需要说明的是,文本标注请求可用于对待标注文件进行标注。In the embodiment of the present disclosure, the user can send a file labeling request to the RPA system through the interactive interface, so that the RPA system can label the file to be marked according to the file labeling request. Wherein, it should be noted that the text annotation request can be used to annotate the file to be annotated.
步骤102,RPA系统响应于文件标注请求,生成与文件标注请求对应的响应结果。In step 102, the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
进一步地,RPA系统根据获取到的文件标注请求,生成与标注请求对应的响应结果,其中,响应结果中可包括:与文件标注请求对应的待标注文件、待标注文件对应的转换图片;以及基于光学字符识别(Optical Character Recognition,简称OCR)获取待标注文件对应的第一文本信息以及第一文本信息中各个文本片段对应的位置信息。其中,文本片段对应的位置信息可包括文本中各个词语及文字分别对应的位置信息。其中,各个词语及文字对应的位置信息可为各个词语及文字相对于页面中的位置,如,词语或文字相对于页面的四个顶点的坐标信息。Further, the RPA system generates a response result corresponding to the labeling request according to the obtained file labeling request, wherein the response result may include: the file to be marked corresponding to the file labeling request, the conversion picture corresponding to the file to be marked; and based on Optical Character Recognition (OCR for short) acquires the first text information corresponding to the file to be marked and the position information corresponding to each text segment in the first text information. Wherein, the location information corresponding to the text segment may include location information corresponding to each word and text in the text. Wherein, the position information corresponding to each word and text may be the position of each word and text relative to the page, for example, the coordinate information of the word or text relative to the four vertices of the page.
步骤103,RPA系统根据响应结果,绘制待标注文件对应的目标图片。 Step 103, the RPA system draws the target picture corresponding to the file to be marked according to the response result.
作为一种示例,响应结果中的待标注文件可包括一个或多个待标注子文件,可根据待标注子文件的数量的不同,采用不同的方式绘制待标注文件对应的目标图片。As an example, the to-be-labeled file in the response result may include one or more to-be-labeled sub-files, and the target picture corresponding to the to-be-labeled file may be drawn in different ways according to the number of to-be-labeled sub-files.
作为一种示例,在待标注文件中包括多个待标注子文件时,RPA系统可根据多个待标注子文件对应的文本信息以及文本信息中各个文本片段对应的位置信息,绘制待标注子文件对应的目标子图片,将目标子图片进行拼接,获取待标注文件对应的目标图片。As an example, when the file to be marked includes multiple sub-files to be marked, the RPA system can draw the sub-files to be marked according to the text information corresponding to the multiple sub-files to be marked and the position information corresponding to each text segment in the text information The corresponding target sub-picture is spliced to obtain the target picture corresponding to the file to be marked.
作为另一种示例,在待标注文件中包括一个待标注子文件时,RPA系统可对该待标注文件对应的第一文本信息以及第一文本信息中各个文本片段对应的位置信息,绘制待标注文件对应的目标图片。As another example, when the file to be marked includes a sub-file to be marked, the RPA system can draw the first text information corresponding to the file to be marked and the position information corresponding to each text segment in the first text information, and draw the The target image corresponding to the file.
步骤104,RPA系统响应于鼠标事件,确定目标图片中的文本标注的区域范围。In step 104, the RPA system determines the area range of the text annotation in the target image in response to the mouse event.
在本公开实施例中,RPA系统可根据鼠标事件确定目标图片中的文本标注的区域范围。比如,在鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件时,可确定鼠标事件确定的文本标注的区域范围。In the embodiment of the present disclosure, the RPA system can determine the area range of the text annotation in the target picture according to the mouse event. For example, when the mouse event sequentially includes: a mouse click event, a mouse move event, and a mouse lift event, the area range of the text marked by the mouse event may be determined.
步骤105,RPA系统根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息各个文本片段对应的位置信息,确定区域范围内的文本标注结果。In step 105, the RPA system determines the text labeling result within the area according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
在本公开实施例中,可根据区域范围的坐标信息确定区域范围所属的待标注子文件,进而,获取待标注子文件在第一文本信息中对应的第二文本信息,以及在第一文本信息的各个文本片段中对应的第二文本信息的各个文本片段,接着,根据区域范围相对于待标注子文件的位置信息,从第二文本信息以及第二文本信息的各个文本片段中确定区域范围内的文本标注结果。In the embodiment of the present disclosure, the to-be-labeled subfile to which the area belongs can be determined according to the coordinate information of the area range, and then the second text information corresponding to the to-be-labeled subfile in the first text information and the first text information in the first text information are obtained. Each text segment of the corresponding second text information in each text segment of each text segment, and then, according to the position information of the range range relative to the sub-file to be marked, determine the area range from the second text information and each text segment of the second text information The text labeling results of .
作为一种应用场景,比如,在招标公告和红头文件中,本公开实施例的文件标注方法能够将非结构化的长文本转换为结构化数据,并协助用户完成文档关键信息的智能提取。As an application scenario, for example, in tender announcements and red-headed documents, the document labeling method of the embodiment of the present disclosure can convert unstructured long text into structured data, and assist users to complete the intelligent extraction of key information of documents.
综上,通过RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注;RPA系统响应于文件标注请求,生成与文件标注请求对应的响应结果;RPA系统根据响应结果,绘制待标注文件对应的目标图片;RPA系统响应于鼠标事件,确定目标图片中的文本标注的区域范围;RPA系统根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。由此,RPA系统通过确定目标图片中的文本标注区域范围,以及区域范围内的文本标注结果,实现了图片中文本信息的提取以及文本中不连续文字的选择,同时可获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,可满足模型训练的需求。In summary, the file annotation request is obtained through the RPA system; the file annotation request is used to annotate the file to be annotated; the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request; the RPA system draws the response result according to the response result Annotate the target picture corresponding to the file; the RPA system determines the area range of the text label in the target picture in response to the mouse event; the RPA system performs optical character recognition OCR according to the document to be marked. The location information corresponding to the text segment determines the text labeling result within the area. Therefore, the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area The text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
为了获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,如图2所示,图2为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图,在本公开实施例中,可确定区域范围所属的待标注子文件,根据区域范围相对于区域范围所属的待标注子文件的位置信息,以及待标注子文件在待标注文件中位置信息,从而,在第一文本信息和第一文本信息的各个文本片段对应的位置信息中,确定区域范围内的文本标注结果。图2所示实施例可包括如下步骤:In order to obtain the text information within the marked area and the position information of the text fragments in the text information, as shown in Figure 2, Figure 2 is the flow of another AI and RPA-based file marking method provided by the embodiment of the present disclosure Schematic diagram, in the embodiment of the present disclosure, it is possible to determine the sub-file to be marked to which the area range belongs, according to the location information of the area range relative to the sub-file to be marked to which the area range belongs, and the position information of the sub-file to be marked in the file to be marked, Therefore, in the first text information and the position information corresponding to each text segment of the first text information, the text labeling result within the area range is determined. The embodiment shown in Figure 2 may include the following steps:
步骤201,RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注。In step 201, the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
步骤202,RPA系统响应于文件标注请求,生成与文件标注请求对应的响应结果。In step 202, the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
步骤203,RPA系统根据响应结果,绘制待标注文件对应的目标图片。 Step 203, the RPA system draws the target picture corresponding to the file to be marked according to the response result.
步骤204,RPA系统响应于鼠标事件,确定目标图片中的文本标注的区域范围。In step 204, the RPA system determines the area range of the text annotation in the target image in response to the mouse event.
步骤205,RPA系统根据区域范围的顶点坐标信息以及待标注文件中的待标注子文件的高度信息,确定区域范围所属的待标注子文件。 Step 205, the RPA system determines the sub-file to be marked to which the area range belongs according to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked.
在本公开实施例中,RPA系统可预先设置每个待标注子文件的高度信息,进而,RPA系统可根据区域范围的顶点坐标信息(如,左上顶点)确定区域范围的顶点相对于待标注子文件对应的目标子图片的原点的高度信息,根据该高度信息以及该每个待标注子文件对应的目标子图片的高度信息,可确定区域范围所属的待标注子文件。比如,区域范围的顶点相对于待标注子文件对应的目标图片的原点的高度大于一个待标注子文件对应的目标子图片的高度,且小于两个待标注子文件对应的目标子图片的高度,可确定区域范围属于第二个待标注子文件。In the embodiment of the present disclosure, the RPA system can pre-set the height information of each sub-file to be marked, and then, the RPA system can determine the height of the vertex of the area range relative to the sub-file to be marked according to the vertex coordinate information (such as the upper left vertex) of the area range. According to the height information of the origin of the target sub-picture corresponding to the file and the height information of the target sub-picture corresponding to each sub-file to be marked, the sub-file to be marked to which the area belongs can be determined. For example, the height of the vertices of the region range relative to the origin of the target picture corresponding to the sub-file to be marked is greater than the height of the target sub-picture corresponding to one sub-file to be marked, and smaller than the height of the target sub-picture corresponding to two sub-files to be marked, It can be determined that the area range belongs to the second sub-file to be marked.
步骤206,RPA系统确定区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息。In step 206, the RPA system determines the location information of the region range relative to the target sub-picture corresponding to the to-be-marked sub-file to which the region range belongs.
举例而言,如图3所示,以待标注文件为pdf文件为例,页面上的元素从外到内依次是:窗口对象window.document,绘制pdf文件的绘图对象canvas,canvas相对于document的位置为left和top,page和canvas之间没有间距。pdf文件的多个待标注子文件对应的目标子图片(page1,page2等),以区域范围所属的待标注子文件对应的目标子图片为page2为例,区域范围左上角坐标为(x,y),区域范围相对于待标注子文件对应的目标子图片page2的位置信息为relativeLeft=x-left,relativeRight=x-left+width,relativeTop=y-top-pageHeight*(PageNo-1),relativeBottom=relativeTop+height。其中,width、height分别为区域范围的宽度和高度,区域范围的宽度和高度可根据标注的结束坐标与起始坐标计算而获取。如,区域范围标注的起始坐标为(x,y),区域范围标注的结束坐标为(x1,y1),区域范围的宽度可为|x1-x|,区域范围的高度可为|y1-y|。For example, as shown in Figure 3, taking the document to be marked as a pdf file as an example, the elements on the page from outside to inside are: window object window.document, drawing object canvas for drawing pdf files, canvas relative to document The positions are left and top, and there is no gap between page and canvas. The target sub-pictures (page1, page2, etc.) corresponding to multiple sub-files to be marked in the pdf file, taking the target sub-picture corresponding to the sub-files to be marked to which the area belongs is page2 as an example, the coordinates of the upper left corner of the area are (x, y ), the position information of the area range relative to the target sub-picture page2 corresponding to the sub-file to be marked is relativeLeft=x-left, relativeRight=x-left+width, relativeTop=y-top-pageHeight*(PageNo-1), relativeBottom= relativeTop+height. Wherein, width and height are the width and height of the area range respectively, and the width and height of the area range can be obtained by calculating the end coordinate and the start coordinate of the label. For example, the start coordinates of the area range label are (x, y), the end coordinates of the area range label are (x1, y1), the width of the area range can be |x1-x|, and the height of the area range can be |y1- y|.
步骤207,RPA系统根据位置信息,在第一文本信息和第一文本信息的各个文本片段对应的位置信息中,确定区域范围内的文本标注结果。In step 207, the RPA system determines the text labeling results within the area in the first text information and the position information corresponding to each text segment of the first text information according to the position information.
在一些实施例中,RPA系统根据区域范围所属的待标注子文件在待标注文件中的位置信息,确定区域范围所属的待标注子文件在第一文本信息中对应的第二文本信息;RPA系统根据第二文本信息与第一文本信息的对应关系,在第一文本信息各个文本片段对应的位置信息中确定第二文本信息的各个文本片段对应的位置信息;RPA系统根据区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息,确定区域范围内在第二文本信息中的第三文本信息;RPA系统根据第三文本信息与第二文本信息的对应关系,在第二文本信息的各个文本片段对应的位置信息中确定第三文本信息的各个文本片段对应的位置信息;RPA系统将第三文本信息和第三文本信息的各个文本片段对应的位置信息,作为区域范围内的文本标注结果。In some embodiments, the RPA system determines the second text information corresponding to the sub-file to be marked to which the area range belongs in the first text information according to the position information of the sub-file to be marked to which the area range belongs; the RPA system According to the corresponding relationship between the second text information and the first text information, determine the position information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information; The location information of the target sub-picture corresponding to the sub-file to be marked to determine the third text information in the second text information within the area; the RPA system according to the correspondence between the third text information and the second text information, in the second text information The position information corresponding to each text segment of the third text information is determined in the position information corresponding to each text segment of the information; the RPA system uses the third text information and the position information corresponding to each text segment of the third text information as the area range Text annotation results.
也就是说,RPA系统在确定区域范围所属的待标注子文件后,可根据该待标注子文件在待标注文件中的位置信息,确定该待标注子文件在第一文本信息中对应的第二文本信息,比如,RPA系统确定区域范围所属的待标注子文件为待标注文件中的第二页,可在第一文本信息中确定第二页的待标注子文件对应的第二文本信息。接着,RPA系统根据第二文本信息与第一文本信息的对应关系,在第一文本信息各个文本片段对应的位置信息中可确定第二文本信息的各个文本片段对应的位置信息。进一步地,RPA系统根据区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息,在第二文本信息中确定区域范围对应的第三文本信息,RPA系统根据第三文本信息与第二文本信息的对应关系,在第二文本信息的各个文本片段对应的位置信息中确定第三文本信息的各个文本片段对应的位置信息;RPA系统将第三文本信息和第三文本信息的各个文本片段对应的位置信息,作为区域范围内的文本标注结果。That is to say, after the RPA system determines the sub-file to be marked to which the area belongs, it can determine the second sub-file corresponding to the sub-file to be marked in the first text information according to the position information of the sub-file to be marked in the file to be marked. For text information, for example, the RPA system determines that the sub-file to be marked to which the region belongs is the second page in the file to be marked, and the second text information corresponding to the sub-file to be marked on the second page can be determined from the first text information. Next, the RPA system can determine the position information corresponding to each text segment of the second text information from the position information corresponding to each text segment of the first text information according to the correspondence relationship between the second text information and the first text information. Further, the RPA system determines the third text information corresponding to the area range in the second text information according to the position information of the area range relative to the target sub-picture corresponding to the sub-file to be marked to which the area range belongs, and the RPA system determines the third text information corresponding to the area range according to the third text information The corresponding relationship with the second text information is to determine the position information corresponding to each text segment of the third text information in the position information corresponding to each text segment of the second text information; the RPA system combines the third text information and the third text information The location information corresponding to each text segment is used as the text labeling result within the region.
在本公开实施例中,RPA系统可将区域范围相对于区域范围所属的待标注子文件的目标子图片的位置信息,以及区域范围的文本标注结果进行标签标注和保存,以作为模型的训练数据,以满足模型训练 的需求。比如,可作为通用文档预训练模型的训练数据。In the embodiment of the present disclosure, the RPA system can label and save the position information of the region range relative to the target sub-picture of the sub-file to be marked to which the region range belongs, and the text labeling results of the region range, as the training data of the model , to meet the needs of model training. For example, it can be used as training data for the general document pre-training model.
作为一种应用场景,通用文档预训练模型可结合文档结构信息和视觉信息进行多模态对齐,该模型可应用于表单理解、票据理解、文档图像分类等任务。As an application scenario, the general document pre-training model can combine document structure information and visual information for multi-modal alignment. This model can be applied to tasks such as form understanding, bill understanding, and document image classification.
在本公开实施例中,步骤201-204可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 201-204 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过RPA系统根据区域范围的顶点坐标信息以及待标注文件中的待标注子文件的高度信息,确定区域范围所属的待标注子文件;RPA系统确定区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息;RPA系统根据位置信息,在第一文本信息和第一文本信息的各个文本片段对应的位置信息中,确定区域范围内的文本标注结果。由此,可准确地确定区域范围内的文本标注结果,从而获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息。In summary, the RPA system determines the sub-file to be marked to which the area belongs according to the vertex coordinate information of the area and the height information of the sub-file to be marked in the file to be marked; the RPA system determines the area to be marked relative to the area to be marked The position information of the target sub-picture corresponding to the sub-file; the RPA system determines the text labeling result within the area range in the first text information and the position information corresponding to each text segment of the first text information according to the position information. In this way, the text labeling result within the area can be accurately determined, so that the text information within the marked area and the position information of the text segment in the text information can be obtained.
为了准确地确定目标图片中的文本标注的区域范围,实现图片中文本信息的提取以及文本中不连续文字的选择,如图4所示,图4为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图,在本公开实施例中,可通过鼠标点击事件、鼠标移动事件和鼠标抬起事件确定目标图片中的文本标注的区域范围。图4所示实施例可包括如下步骤:In order to accurately determine the range of the text label in the target picture, realize the extraction of text information in the picture and the selection of discontinuous characters in the text, as shown in Figure 4, Figure 4 is another method based on Schematic flowchart of the AI and RPA file labeling method. In the embodiment of the present disclosure, the range of text labeling in the target image can be determined through mouse click events, mouse move events, and mouse lift events. The embodiment shown in Figure 4 may include the following steps:
步骤401,RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注。In step 401, the RPA system acquires a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
步骤402,RPA系统响应于文件标注请求,生成与文件标注请求对应的响应结果。In step 402, the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
步骤403,RPA系统根据响应结果,绘制待标注文件对应的目标图片。 Step 403, the RPA system draws the target picture corresponding to the file to be marked according to the response result.
步骤404,RPA系统监听目标图片的鼠标事件;其中,鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件。In step 404, the RPA system monitors the mouse event of the target image; wherein, the mouse event sequentially includes: a mouse click event, a mouse move event, and a mouse lift event.
在本公开实施例中,RPA系统可通过监听函数监听目标图标的鼠标事件,在鼠标事件依次包括鼠标点击事件(mousedown事件)、鼠标移动事件(mousemove事件)和鼠标抬起事件(mouseup事件)时,可确定在目标图片中进行文本标注的区域范围的选择。In the embodiment of the present disclosure, the RPA system can monitor the mouse event of the target icon through the monitoring function. When the mouse event includes a mouse click event (mousedown event), a mouse movement event (mousemove event) and a mouse up event (mouseup event) in sequence , which can determine the selection of the area range for text annotation in the target image.
步骤405,RPA系统根据鼠标点击事件,确定区域范围的第一坐标。 Step 405, the RPA system determines the first coordinate of the area range according to the mouse click event.
进而,RPA系统通过监听鼠标的点击事件,可将鼠标的点击事件的坐标作为区域范围的起始坐标,即第一坐标。Furthermore, the RPA system can use the coordinates of the mouse click event as the starting coordinate of the area range, that is, the first coordinate, by monitoring the mouse click event.
步骤406,RPA系统根据鼠标移动事件和鼠标抬起事件,确定区域范围的第二坐标。 Step 406, the RPA system determines the second coordinates of the area range according to the mouse moving event and the mouse lifting event.
进一步地,RPA系统通过监听鼠标的移动事件和鼠标抬起事件,可确定区域范围的结束坐标,即第二坐标。Further, the RPA system can determine the end coordinate of the area range, that is, the second coordinate, by monitoring the mouse movement event and the mouse lift event.
步骤407,RPA系统根据第一坐标和第二坐标,确定区域范围的高度值和宽度值。 Step 407, the RPA system determines the height value and width value of the area range according to the first coordinate and the second coordinate.
比如,可将第一坐标中的横坐标与第二坐标的横坐标相减,将相减结果的绝对值作为区域范围的宽度值。将第一坐标中的纵坐标与第二坐标的纵坐标相减,将相减结果的绝对值作为区域范围的高度值。For example, the abscissa in the first coordinate may be subtracted from the abscissa in the second coordinate, and the absolute value of the subtraction result may be used as the width value of the area range. The ordinate in the first coordinate is subtracted from the ordinate in the second coordinate, and the absolute value of the subtraction result is used as the height value of the area range.
步骤408,RPA系统将第一坐标、第二坐标以及区域范围的高度值和宽度值的围合区域,作为目标图片中的文本标注的区域范围。In step 408, the RPA system uses the first coordinate, the second coordinate, and the enclosed area of the height value and width value of the area range as the area range marked by the text in the target image.
在本公开实施例中,RPA系统将第一坐标的横坐标与区域范围的宽度值相加,可获取第三坐标,RPA系统将第一坐标的纵坐标与区域范围的高度值相加,可获取第四坐标,将第一坐标、第二坐标、第三坐标和第四坐标围合的围合区域,作为目标图片中的文本标注的区域范围。需要说明的是,为了更加准确地确定区域范围内的文本标注结果,同一时间内目标图片中的文本标注的区域范围的数量为一个,该区域范围可通过标签<div>实现。In the embodiment of the present disclosure, the RPA system adds the abscissa of the first coordinate to the width value of the area range to obtain the third coordinate, and the RPA system adds the ordinate of the first coordinate to the height value of the area range to obtain The fourth coordinate is obtained, and the enclosed area enclosed by the first coordinate, the second coordinate, the third coordinate and the fourth coordinate is used as the range of the text label in the target image. It should be noted that, in order to more accurately determine the text labeling results within the area, the number of text labeling areas in the target image at the same time is one, and the area range can be realized through the tag <div>.
步骤409,RPA系统根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。In step 409, the RPA system determines the text labeling result within the region according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
在本公开实施例中,步骤401-403、409可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 401-403, 409 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过RPA系统监听目标图片的鼠标事件;其中,鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件;RPA系统根据鼠标点击事件,确定区域范围的第一坐标;RPA系统根据鼠标移动事件和鼠标抬起事件,确定区域范围的第二坐标;RPA系统将第一坐标、第二坐标以及区域范围的高度值和宽度值的围合区域,作为目标图片中的文本标注的区域范围。由此,RPA系统响应于鼠标事件,可准确地确定目标图片中的文本标注的区域范围,实现图片中文本信息的提取以及文本中不连续文字的选择。In summary, the mouse event of the target image is monitored through the RPA system; among them, the mouse event includes: mouse click event, mouse move event and mouse lift event; the RPA system determines the first coordinate of the area according to the mouse click event; the RPA system According to the mouse movement event and the mouse lift event, determine the second coordinate of the area range; the RPA system uses the first coordinate, the second coordinate, and the enclosed area of the height value and width value of the area range as the text annotation in the target image geographic range. Therefore, in response to the mouse event, the RPA system can accurately determine the range of the text label in the target picture, and realize the extraction of text information in the picture and the selection of discontinuous characters in the text.
为了获取与待标注请求对应的响应结果,如图5所示,图5为本公开实施例所提供的另一种基于AI和RPA的文件标注方法的流程示意图,在本公开实施例中,在待标注文件为非图片的情况下,可先将待标注文件转换为转换图片,进而,根据光学字符识别对转换图片进行字符识别,获取待标注文件对应的第一文本信息以及第一文本信息的各个文本片段对应的位置信息。图5所示实施例可包括如下步骤:In order to obtain the response result corresponding to the request to be tagged, as shown in FIG. 5, FIG. 5 is a schematic flowchart of another AI and RPA-based file tagging method provided by the embodiment of the present disclosure. In the embodiment of the present disclosure, in In the case that the file to be marked is not a picture, the file to be marked can be converted into a converted picture first, and then character recognition is performed on the converted picture according to optical character recognition to obtain the first text information corresponding to the document to be marked and the first text information. The location information corresponding to each text segment. The embodiment shown in Figure 5 may include the following steps:
步骤501,RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注。In step 501, the RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
步骤502,RPA系统根据文件标注请求,获取与文件标注请求对应的待标注文件。 Step 502, the RPA system obtains the file to be marked corresponding to the file marking request according to the file marking request.
在本公开实施例中,RPA系统可根据文件标注请求中的待标注文件的标识获取与文件标注请求对应的待标注文件。其中,文件标注请求中可包括待标注文件的标识。In the embodiment of the present disclosure, the RPA system may acquire the file to be marked corresponding to the file marking request according to the identification of the file to be marked in the file marking request. Wherein, the file marking request may include the identification of the file to be marked.
步骤503,RPA系统将待标注文件进行图片转换,获取与待标注文件对应的转换图片。In step 503, the RPA system performs image conversion on the file to be marked, and obtains the converted picture corresponding to the file to be marked.
在本公开实施例中,在待标注文件不为图片的情况下,可将待标注文件进行图片转换。In the embodiment of the present disclosure, if the file to be marked is not a picture, the file to be marked may be converted into a picture.
作为一种示例,可通过文档图片转换技术将待标注文件转换为图片,将转换后的图片作为转换图片。比如,可通过pdf.js插件将pdf文件转换为图片。As an example, the document to be marked may be converted into a picture through a document picture conversion technology, and the converted picture may be used as a converted picture. For example, the pdf file can be converted to an image through the pdf.js plug-in.
步骤504,RPA系统基于光学字符识别对转换图片进行字符识别,以获取待标注文件对应的第一文本信息以及第一文本信息的各个文本片段对应的位置信息。In step 504, the RPA system performs character recognition on the converted image based on optical character recognition, so as to obtain the first text information corresponding to the document to be marked and the position information corresponding to each text segment of the first text information.
进而,RPA系统基于光学字符识别对转换图片进行字符识别,将识别到的文本信息作为待标注文件对应的第一文本信息,并将识别到的文本信息中的每个字或词的位置信息(如,文本信息中的每个字或词在页面中的上下左右4个顶点的x和y轴坐标)作为第一文本信息的各个文本片段对应的位置信息。其中,需要说明的是,为了避免图片不清晰,可将转换图片放大预设倍数,将放大预设倍数的转换图片发送给光学字符识别接口,以对转换图片进行字符识别。Furthermore, the RPA system performs character recognition on the converted picture based on optical character recognition, uses the recognized text information as the first text information corresponding to the document to be marked, and uses the position information ( For example, the x-axis and y-axis coordinates of each word or word in the page on the top, bottom, left, and right vertices) are used as position information corresponding to each text segment of the first text information. Wherein, it should be noted that, in order to prevent the picture from being unclear, the converted picture may be enlarged by a preset multiple, and the converted picture enlarged by the preset multiple may be sent to the optical character recognition interface for performing character recognition on the converted picture.
步骤505,RPA系统将待标注文件、待标注文件对应的第一文本信息和第一文本信息的各个文本片段对应的位置信息,作为与文件标注请求对应的响应结果。 Step 505, the RPA system takes the file to be marked, the first text information corresponding to the file to be marked, and the location information corresponding to each text segment of the first text information as a response result corresponding to the file marking request.
在本公开实施例中,RPA系统可将与文件标注请求对应待标注文件、待标注文件对应的第一文本信息和第一文本信息的各个文本片段对应的位置信息,作为与文件标注请求对应的响应结果。In the embodiment of the present disclosure, the RPA system may use the position information corresponding to the file to be marked corresponding to the file marking request, the first text information corresponding to the file to be marked, and each text segment of the first text information as the corresponding position information of the file marking request Response results.
步骤506,RPA系统根据响应结果,绘制待标注文件对应的目标图片。 Step 506, the RPA system draws the target picture corresponding to the document to be marked according to the response result.
步骤507,RPA系统响应于鼠标事件,确定目标图片中的文本标注的区域范围。In step 507, the RPA system determines the area range of the text annotation in the target picture in response to the mouse event.
步骤508,RPA系统根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。In step 508, the RPA system determines the text labeling result within the region according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
在本公开实施例中,步骤501、506-508可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 501, 506-508 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过RPA系统根据文件标注请求,获取与文件标注请求对应的待标注文件;RPA系统将待标注文件进行图片转换,获取与待标注文件对应的转换图片;RPA系统基于光学字符识别对转换图片进行字符识别,以获取待标注文件对应的第一文本信息以及第一文本信息的各个文本片段对应的位置信息;RPA系统将待标注文件、待标注文件对应的第一文本信息和第一文本信息的各个文本片段对应的位置信息,作为与文件标注请求对应的响应结果。由此,RPA系统根据文件标注请求,可准确地获取与待标注请求对应的响应结果。To sum up, the RPA system obtains the file to be marked corresponding to the file marking request through the RPA system; the RPA system converts the image of the file to be marked to obtain the converted image corresponding to the file to be marked; the RPA system converts the file based on optical character recognition Character recognition is performed on the picture to obtain the first text information corresponding to the file to be marked and the position information corresponding to each text segment of the first text information; The location information corresponding to each text segment of the information is used as a response result corresponding to the file annotation request. Therefore, the RPA system can accurately obtain the response result corresponding to the request to be marked according to the file marking request.
为了准确地绘制待标注文件对应的目标图片,如图6所示,图6为本公开实施例所提供的另一种基 于AI和RPA的文件标注方法的流程示意图,在本公开实施例中,可确定待标注子文件对应的目标子图片,根据多个目标子图片,可确定待标注文件对应的目标图片。图6所示实施例可包括如下步骤:In order to accurately draw the target picture corresponding to the file to be marked, as shown in Figure 6, Figure 6 is a schematic flowchart of another AI and RPA-based file marking method provided by the embodiment of the present disclosure. In the embodiment of the present disclosure, The target sub-picture corresponding to the sub-file to be marked can be determined, and the target picture corresponding to the file to be marked can be determined according to a plurality of target sub-pictures. The embodiment shown in Figure 6 may include the following steps:
步骤601,RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注。In step 601, the RPA system acquires a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated.
步骤602,RPA系统响应于文件标注请求,生成与文件标注请求对应的响应结果。In step 602, the RPA system generates a response result corresponding to the file annotation request in response to the file annotation request.
步骤603,RPA系统获取响应结果中的待标注文件的多个待标注子文件。 Step 603, the RPA system acquires multiple sub-files to be marked of the file to be marked in the response result.
在本公开实施例中,待标注文件可包括多个待标注子文件或者一个待标注子文件。比如,待标注文件为pdf文件,pdf文件的页数可为一页或多页,在pdf文件的页数为一页时,待标注文件仅包括一个待标注子文件,在pdf文件的页数为多页时,RPA系统可将每页pdf文件作为待标注文件的一个待标注子文件。多页pdf文件可包括多个待标注子文件。对于多页pdf文件中每个待标注子文件,RPA系统可对每个待标注子文件进行标识,该标识可用于标识每个待标注子文件在待标注文件中的位置。In the embodiment of the present disclosure, the file to be marked may include multiple sub-files to be marked or one sub-file to be marked. For example, the file to be marked is a pdf file, and the number of pages of the pdf file can be one or more pages. When the number of pages of the pdf file is one page, the file to be marked only includes one sub-file to be marked. When there are multiple pages, the RPA system can use each page of the pdf file as a sub-file to be marked. A multi-page pdf file can include multiple sub-files to be annotated. For each sub-file to be marked in the multi-page pdf file, the RPA system can identify each sub-file to be marked, and the mark can be used to identify the position of each sub-file to be marked in the file to be marked.
步骤604,RPA系统针对每个待标注子文件,创建与待标注子文件对应的绘图对象。 Step 604, for each sub-file to be marked, the RPA system creates a drawing object corresponding to the sub-file to be marked.
在本公开实施例中,对于每个待标注子文件,可创建与待标注子文件对应的绘图对象,以及根据待标注子文件的属性信息创建页面对象,比如,绘图对象为canvas对象,页面对象为page对象,page对象包括待标注子文件的高度信息和宽度信息。In the embodiment of the present disclosure, for each subfile to be marked, a drawing object corresponding to the subfile to be marked can be created, and a page object can be created according to the attribute information of the subfile to be marked, for example, the drawing object is a canvas object, and the page object It is a page object, and the page object includes the height information and width information of the subfile to be marked.
可以理解的是,绘图对象创建时,RPA系统为其设置默认的宽度值和高度值,为了使绘制的目标图片与待标注文件大小具有对应关系(如,大小一致),在本公开实施例中,RPA系统可根据页面对象中的待标注子文件的属性信息调整绘图对象的尺寸信息。It can be understood that when a drawing object is created, the RPA system sets a default width value and height value for it. In order to make the drawn target picture have a corresponding relationship with the size of the file to be marked (for example, the size is consistent), in the embodiment of the present disclosure , the RPA system can adjust the size information of the drawing object according to the attribute information of the sub-file to be marked in the page object.
步骤605,RPA系统根据待标注子文件在待标注文件中的位置信息,确定待标注子文件在第一文本信息中对应的文本信息。 Step 605, the RPA system determines the text information corresponding to the sub-file to be marked in the first text information according to the position information of the sub-file to be marked in the file to be marked.
作为一种示例,对于每个待标注子文件,RPA系统可对每个待标注子文件进行标识,该标识可用于标识每个待标注子文件在待标注文件中的位置,RPA系统可根据待标注子文件在待标注文件中的位置信息,确定待标注子文件在第一文本信息中对应的文本信息。比如,待标注子文件在待标注文件中所属的位置为第二页,可在第一文本信息中获取第二页待标注子文件对应的文本信息。As an example, for each sub-file to be marked, the RPA system can identify each sub-file to be marked, which can be used to identify the position of each sub-file to be marked in the file to be marked, and the RPA system can according to the Mark the position information of the sub-file in the file to be marked, and determine the text information corresponding to the sub-file to be marked in the first text information. For example, the position of the to-be-labeled sub-file in the to-be-labeled file is the second page, and the text information corresponding to the second page of the to-be-labeled sub-file can be obtained from the first text information.
步骤606,根据文本信息与第一文本信息的对应关系,在第一文本信息的各个文本片段对应的位置信息中确定文本信息的各个文本片段对应的位置信息。 Step 606, according to the corresponding relationship between the text information and the first text information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment of the first text information.
进一步地,在确定待标注子文件在第一文本信息中对应的文本信息后,根据该文本信息与第一文本信息之间的对应关系,可在第一文本信息的各个文本片段对应的位置信息中,确定该文本信息的各个文本片段对应的位置信息。Further, after determining the text information corresponding to the subfile to be marked in the first text information, according to the correspondence between the text information and the first text information, the position information corresponding to each text segment of the first text information can be , determine the position information corresponding to each text segment of the text information.
步骤607,RPA系统根据绘图对象的尺寸信息,和待标注子文件对应的文本信息和文本信息的各个文本片段对应的位置信息,绘制待标注子文件对应的目标子图片。 Step 607, the RPA system draws the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and the position information corresponding to each text fragment of the text information.
进而,RPA系统根据绘图对象的尺寸信息,结合待标注子文件对应的文本信息和文本信息的各个文本片段对应的位置信息,绘制待标注子文件对应的目标子图片。其中,需要说明的是,目标子图片与待标注子文件的大小相同,目标子图片可包括待标注子文件的文本信息以及文本信息的各个文本片段对应的位置信息。Furthermore, the RPA system draws the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, combined with the text information corresponding to the sub-file to be marked and the position information corresponding to each text segment of the text information. Wherein, it should be noted that the size of the target sub-picture is the same as that of the sub-file to be marked, and the target sub-picture may include text information of the sub-file to be marked and position information corresponding to each text segment of the text information.
步骤608,RPA系统将多个待标注子文件对应的目标子图片进行图片拼接,以得到目标图片。In step 608, the RPA system stitches together the target sub-pictures corresponding to the multiple sub-files to be marked to obtain the target picture.
进一步地,RPA系统将多个待标注子文件对应的目标子图片进行图片拼接,将多个目标子图片的拼接结果作为目标图片。Further, the RPA system splices the target sub-pictures corresponding to the multiple sub-files to be marked, and takes the splicing result of the multiple target sub-pictures as the target picture.
步骤609,RPA系统响应于鼠标事件,确定目标图片中的文本标注的区域范围。In step 609, the RPA system determines the area range of the text annotation in the target picture in response to the mouse event.
步骤610,RPA系统根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。In step 610, the RPA system determines the text labeling result within the area according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
在本公开实施例中,步骤601-602、609-610可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 601-602 and 609-610 may be implemented in any one of the embodiments of the present disclosure respectively, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过RPA系统获取响应结果中的待标注文件的多个待标注子文件;RPA系统针对每个待标注子文件,创建与待标注子文件对应的绘图对象;根据文本信息与第一文本信息的对应关系,在第一文本信息的各个文本片段对应的位置信息中确定文本信息的各个文本片段对应的位置信息;RPA系统根据绘图对象的尺寸信息,和待标注子文件对应的文本信息和文本信息的各个文本片段对应的位置信息,绘制待标注子文件对应的目标子图片;RPA系统将多个待标注子文件对应的目标子图片进行图片拼接,以得到目标图片。由此,RPA系统根据多个待标注子文件以及待标注子文件对应的文本信息和文本信息的各个文本片段对应的位置信息,可准确地绘制待标注文件对应的目标图片。To sum up, multiple sub-files to be marked in the response result are obtained through the RPA system; for each sub-file to be marked, the RPA system creates a drawing object corresponding to the sub-file to be marked; according to the text information and the first text The corresponding relationship of information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment of the first text information; The RPA system is based on the size information of the drawing object, and the text information corresponding to the sub-file to be marked and The position information corresponding to each text segment of the text information, draws the target sub-picture corresponding to the sub-file to be marked; the RPA system stitches the target sub-pictures corresponding to multiple sub-files to be marked to obtain the target picture. Therefore, the RPA system can accurately draw the target picture corresponding to the file to be marked according to the plurality of sub-files to be marked, the text information corresponding to the sub-files to be marked, and the position information corresponding to each text segment of the text information.
本公开实施例的基于AI和RPA的文件标注方法,通过RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注;RPA系统响应于所述文件标注请求,生成与文件标注请求对应的响应结果;RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片;RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。由此,RPA系统通过确定目标图片中的文本标注区域范围,以及区域范围内的文本标注结果,实现了图片中文本信息的提取以及文本中不连续文字的选择,同时可获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,可满足模型训练的需求。In the AI and RPA-based file tagging method of the disclosed embodiment, the file tagging request is obtained through the RPA system; wherein, the file tagging request is used to tag the file to be tagged; the RPA system generates a file tagging request in response to the file tagging request Corresponding response result; RPA system draws the target picture corresponding to the document to be marked according to the response result; RPA system determines the area range of the text label in the target picture in response to the mouse event; RPA system according to the described The first text information obtained by performing optical character recognition (OCR) on the file to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the area. Therefore, the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area The text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
与上述图1至图6实施例提供的基于AI和RPA的文件标注方法相对应,本公开还提供一种基于AI和RPA的文件标注装置,由于本公开实施例提供的基于AI和RPA的文件标注装置与上述图1至图6实施例提供的基于AI和RPA的文件标注方法相对应,因此在基于AI和RPA的文件标注方法的实施方式也适用于本公开实施例提供的基于AI和RPA的文件标注装置,在本公开实施例中不再详细描述。Corresponding to the AI and RPA-based file tagging method provided by the above-mentioned embodiments of FIGS. The tagging device corresponds to the AI-based and RPA-based file tagging method provided in the embodiments of FIGS. The file tagging device for is not described in detail in the embodiments of the present disclosure.
图7为本公开实施例提供的一种基于AI和RPA的文件标注装置的结构示意图。FIG. 7 is a schematic structural diagram of an AI and RPA-based file tagging device provided by an embodiment of the present disclosure.
如图7所示,该基于AI和RPA的文件标注装置700可以包括:获取模块710、生成模块720、绘制模块730、第一确定模块740和第二确定模块750。As shown in FIG. 7 , the AI and RPA-based document tagging device 700 may include: an acquisition module 710 , a generation module 720 , a drawing module 730 , a first determination module 740 and a second determination module 750 .
其中,获取模块710,用于获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注;生成模块720,用于响应于文件标注请求,生成与文件标注请求对应的响应结果;绘制模块730,用于根据响应结果,绘制待标注文件对应的目标图片;第一确定模块740,用于响应于鼠标事件,确定目标图片中的文本标注的区域范围;第二确定模块750,用于根据对待标注文件进行光学字符识别OCR所获取的第一文本信息和第一文本信息的各个文本片段对应的位置信息,确定区域范围内的文本标注结果。Wherein, the acquisition module 710 is used to obtain the file annotation request; wherein the file annotation request is used to annotate the file to be annotated; the generation module 720 is used to generate a response result corresponding to the file annotation request in response to the file annotation request; the drawing module 730, for drawing the target picture corresponding to the file to be marked according to the response result; the first determining module 740, for responding to the mouse event, determining the range of the text label in the target picture; the second determining module 750, for according to The first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the area.
作为本公开实施例的一种可能实现方式,第二确定模块750,用于:根据区域范围的顶点坐标信息以及待标注文件中的待标注子文件的高度信息,确定区域范围所属的待标注子文件;确定区域范围相对于区域范围所属的待标注子文件对应的目标子图片的位置信息;根据位置信息,在第一文本信息和第一文本信息的各个文本片段对应的位置信息中,确定区域范围内的文本标注结果。As a possible implementation of the embodiment of the present disclosure, the second determination module 750 is configured to: determine the sub-file to be marked to which the area range belongs according to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked file; determine the position information of the target sub-picture corresponding to the sub-file to be marked to which the area range belongs; according to the position information, in the first text information and the position information corresponding to each text segment of the first text information, determine the area Text labeling results within the range.
作为本公开实施例的一种可能实现方式,第二确定模块750,还用于:根据区域范围所属的待标注子文件在待标注文件中的位置信息,确定区域范围所属的待标注子文件在第一文本信息中对应的第二文本信息;根据第二文本信息与第一文本信息的对应关系,在第一文本信息的各个文本片段对应的位置信息中确定第二文本信息各个文本片段对应的位置信息;根据区域范围相对于区域范围所属的待标注子文件的位置信息,确定区域范围内在所述第二文本信息中的第三文本信息;根据第三文本信息与第二文本信息的对应关系,在第二文本信息的各个文本片段对应的位置信息中确定第三文本信息的各个文本片段对应的位置信息;将第三文本信息和第三文本信息的各个文本片段对应的位置信息,作为区域范围内的文本标注结果。As a possible implementation of the embodiment of the present disclosure, the second determination module 750 is further configured to: determine the location of the subfile to be marked to which the region range belongs according to the position information of the subfile to be marked to which the region range belongs. The corresponding second text information in the first text information; according to the corresponding relationship between the second text information and the first text information, determine the location information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information Position information; according to the position information of the area range relative to the sub-file to be marked to which the area range belongs, determine the third text information in the second text information within the area range; according to the correspondence between the third text information and the second text information , determining the position information corresponding to each text segment of the third text information in the position information corresponding to each text segment of the second text information; using the third text information and the position information corresponding to each text segment of the third text information as an area Text labeling results within the range.
作为本公开实施例的一种可能实现方式,第一确定模块740,用于:监听目标图片的鼠标事件;其中,鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件;根据鼠标点击事件,确定区域范围的第一坐标;根据鼠标移动事件和鼠标抬起事件,确定区域范围的第二坐标;根据第一坐标和第二 坐标,确定区域范围的高度值和宽度值;将第一坐标、第二坐标以及区域范围的高度值和宽度值的围合区域,作为目标图片中的文本标注的区域范围。As a possible implementation of the embodiment of the present disclosure, the first determination module 740 is configured to: monitor the mouse event of the target picture; wherein, the mouse event includes in sequence: a mouse click event, a mouse move event, and a mouse lift event; Click the event to determine the first coordinate of the area; according to the mouse movement event and the mouse lift event, determine the second coordinate of the area; according to the first coordinate and the second coordinate, determine the height and width of the area; set the second The area enclosed by the first coordinate, the second coordinate, and the height and width values of the area range is used as the area range marked by the text in the target image.
作为本公开实施例的一种可能实现方式,生成模块720,用于:根据文件标注请求,获取与文件标注请求对应的待标注文件;将待标注文件进行图片转换,获取与待标注文件对应的转换图片;基于光学字符识别OCR对所述转换图片进行字符识别,以获取所述待标注文件对应的第一文本信息以及所述第一文本信息的各个文本片段对应的位置信息;将待标注文件、待标注文件对应的第一文本信息和第一文本信息的各个文本片段对应的位置信息,作为与文件标注请求对应的响应结果。As a possible implementation of the embodiment of the present disclosure, the generating module 720 is configured to: acquire the file to be marked corresponding to the file marking request according to the file marking request; perform image conversion on the file to be marked to obtain the corresponding file to be marked Converting the picture; performing character recognition on the converted picture based on optical character recognition OCR, to obtain the first text information corresponding to the document to be marked and the position information corresponding to each text segment of the first text information; the document to be marked , the first text information corresponding to the file to be marked and the location information corresponding to each text segment of the first text information, as a response result corresponding to the file marking request.
作为本公开实施例的一种可能实现方式,绘制模块730,用于:获取响应结果中的待标注文件的多个待标注子文件;针对待标注文件中的每个待标注子文件,创建与待标注子文件对应的绘图对象;根据待标注子文件在所述待标注文件中的位置信息,确定待标注子文件在第一文本信息中对应的文本信息;根据文本信息与第一文本信息的对应关系,在第一文本信息中各个文本片段对应的位置信息中确定文本信息的各个文本片段对应的位置信息;根据绘图对象的尺寸信息,和待标注子文件对应的文本信息和文本信息的各个文本片段对应的位置信息,绘制待标注子文件对应的目标子图片;将多个待标注子文件对应的目标子图片进行图片拼接,以得到目标图片。As a possible implementation of the embodiment of the present disclosure, the drawing module 730 is configured to: obtain multiple sub-files to be marked in the response result; for each sub-file to be marked in the file to be marked, create and The drawing object corresponding to the subfile to be marked; according to the position information of the subfile to be marked in the file to be marked, determine the text information corresponding to the subfile to be marked in the first text information; according to the text information and the first text information Correspondence, in the position information corresponding to each text segment in the first text information, determine the position information corresponding to each text segment of the text information; according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and each of the text information The position information corresponding to the text segment is used to draw the target sub-picture corresponding to the sub-file to be marked; the target sub-picture corresponding to the multiple sub-files to be marked is spliced to obtain the target picture.
作为本公开实施例的一种可能实现方式,基于AI和RPA的文件标注装置700还包括:处理模块。其中,处理模块,用于将区域范围相对于区域范围所属的待标注子文件的位置信息,以及区域范围内的第三文本信息和所述第三文本信息的各个文本片段对应的位置信息进行标签标注和保存,以作为模型的训练数据。As a possible implementation of the embodiment of the present disclosure, the AI and RPA-based document tagging apparatus 700 further includes: a processing module. Wherein, the processing module is used for labeling the position information of the sub-file to be marked with respect to the region range, and the third text information within the region range and the position information corresponding to each text segment of the third text information Annotate and save as training data for the model.
本公开实施例的基于AI和RPA的文件标注装置,通过RPA系统获取文件标注请求;其中,文件标注请求用于对待标注文件进行标注;RPA系统响应于所述文件标注请求,生成与文件标注请求对应的响应结果;RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片;RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。由此,RPA系统通过确定目标图片中的文本标注区域范围,以及区域范围内的文本标注结果,实现了图片中文本信息的提取以及文本中不连续文字的选择,同时可获取到标注的区域范围内的文本信息以及文本信息中文本片段的位置信息,可满足模型训练的需求。The AI and RPA-based file tagging device of the disclosed embodiment obtains the file tagging request through the RPA system; wherein, the file tagging request is used to tag the file to be tagged; the RPA system generates a file tagging request in response to the file tagging request Corresponding response result; RPA system draws the target picture corresponding to the document to be marked according to the response result; RPA system determines the area range of the text label in the target picture in response to the mouse event; RPA system according to the described The first text information obtained by performing optical character recognition (OCR) on the file to be marked and the position information corresponding to each text segment of the first text information determine the text marking result within the range of the area. Therefore, the RPA system realizes the extraction of text information in the image and the selection of discontinuous text in the text by determining the range of the text annotation area in the target image and the text annotation results within the area range, and at the same time can obtain the range of the annotation area The text information in the text information and the position information of the text fragments in the text information can meet the needs of model training.
为了实现上述实施例,本公开实施例还提出一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如前述任一方法实施例所述的基于AI和RPA的文件标注方法。In order to realize the above-mentioned embodiments, an embodiment of the present disclosure also proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the The AI and RPA-based document labeling method as described in any of the foregoing method embodiments.
为了实现上述实施例,本公开实施例还提出一种非临时性计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如前述任一方法实施例所述的基于AI和RPA的文件标注方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure also propose a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method based on AI and RPA document annotation methods.
为了实现上述实施例,本公开实施例还提出一种计算机程序产品,当所述计算机程序产品中的指令处理器执行时,实现如前述任一方法实施例所述的基于AI和RPA的文件标注方法。In order to realize the above-mentioned embodiment, the embodiment of the present disclosure also proposes a computer program product, when the instruction processor in the computer program product is executed, the file annotation based on AI and RPA as described in any of the foregoing method embodiments is realized method.
如图8所示,图8是根据本公开实施例所提供的基于AI和RPA的文件标注方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。As shown in FIG. 8 , FIG. 8 is a block diagram of an electronic device according to an AI and RPA-based file tagging method provided by an embodiment of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图8所示,该电子设备包括:一个或多个处理器801、存储器802,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储 器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,根据需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图8中以一个处理器801为例。As shown in FIG. 8, the electronic device includes: one or more processors 801, a memory 802, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and can be mounted on a common motherboard or otherwise as desired. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory, to display graphical information of a GUI on an external input/output device such as a display device coupled to an interface. In other implementations, multiple processors and/or multiple buses may be used with multiple memories and multiple memories, as desired. Likewise, multiple electronic devices may be connected, with each device providing some of the necessary operations (eg, as a server array, a set of blade servers, or a multi-processor system). In FIG. 8, a processor 801 is taken as an example.
存储器802即为本公开所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本公开所提供的基于AI和RPA的文件标注方法。本公开的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本公开所提供的基于AI和RPA的文件标注方法。The memory 802 is a non-transitory computer-readable storage medium provided in the present disclosure. Wherein, the memory stores instructions executable by at least one processor, so that the at least one processor executes the AI and RPA-based file tagging method provided by the present disclosure. The non-transitory computer-readable storage medium of the present disclosure stores computer instructions, and the computer instructions are used to enable a computer to execute the AI and RPA-based file tagging method provided in the present disclosure.
存储器802作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本公开实施例中的基于AI和RPA的文件标注方法对应的程序指令/模块(例如,附图7所示的获取模块710、生成模块720、绘制模块730、第一确定模块740和第二确定模块750)。处理器801通过运行存储在存储器802中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的基于AI和RPA的文件标注方法。The memory 802, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/ modules (for example, the acquisition module 710, the generation module 720, the drawing module 730, the first determination module 740 and the second determination module 750 shown in FIG. 7). The processor 801 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 802, that is, implements the AI and RPA-based file marking method in the above method embodiments.
存储器802可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据语义表示模型的生成的电子设备的使用所创建的数据等。此外,存储器802可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器802可选包括相对于处理器801远程设置的存储器,这些远程存储器可以通过网络连接至基于AI和RPA的文件标注的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 802 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; data etc. In addition, the memory 802 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the storage 802 may optionally include storages that are set remotely relative to the processor 801, and these remote storages may be connected to the electronic device for document annotation based on AI and RPA through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
基于AI和RPA的文件标注方法的电子设备还可以包括:输入装置803和输出装置804。处理器801、存储器802、输入装置803和输出装置804可以通过总线或者其他方式连接,图8中以通过总线连接为例。The electronic device of the AI and RPA-based document tagging method may further include: an input device 803 and an output device 804 . The processor 801, the memory 802, the input device 803, and the output device 804 may be connected through a bus or in other ways. In FIG. 8, connection through a bus is taken as an example.
输入装置803可接收输入的数字或字符信息,以及产生与基于AI和RPA的文件标注的生成的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置804可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。The input device 803 can receive input numbers or character information, and generate key signal inputs related to the user settings and function control of the generated electronic equipment based on AI and RPA file annotation, such as touch screen, small keyboard, mouse, trackpad, Input devices such as a touchpad, pointing stick, one or more mouse buttons, trackball, joystick, etc. The output device 804 may include a display device, an auxiliary lighting device (eg, LED), a tactile feedback device (eg, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向 装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
另外,本公开的技术方案中所涉及的信息的获取、存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In addition, the acquisition, storage, and application of information involved in the technical solutions of the present disclosure comply with relevant laws and regulations, and do not violate public order and good customs.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开提出的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution proposed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (17)

  1. 一种基于人工智能AI和机器人流程自动化RPA的文件标注方法,其特征在于,包括:A document labeling method based on artificial intelligence AI and robotic process automation RPA, characterized in that it includes:
    RPA系统获取文件标注请求;其中,所述文件标注请求用于对待标注文件进行标注;The RPA system obtains a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated;
    所述RPA系统响应于所述文件标注请求,生成与所述文件标注请求对应的响应结果;The RPA system generates a response result corresponding to the file annotation request in response to the file annotation request;
    所述RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片;The RPA system draws the target picture corresponding to the file to be marked according to the response result;
    所述RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;The RPA system determines the area range of the text label in the target picture in response to the mouse event;
    所述RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。The RPA system determines the text labeling result within the area range according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information.
  2. 根据权利要求1所述的方法,其特征在于,所述RPA系统根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果,包括:The method according to claim 1, wherein the RPA system is based on the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the corresponding positions of each text segment of the first text information information to determine the text labeling results within the range of the region, including:
    所述RPA系统根据所述区域范围的顶点坐标信息以及所述待标注文件中的待标注子文件的高度信息,确定所述区域范围所属的待标注子文件;The RPA system determines the sub-file to be marked to which the area range belongs according to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked;
    所述RPA系统确定所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息;The RPA system determines the location information of the region range relative to the target sub-picture corresponding to the sub-file to be marked to which the region range belongs;
    所述RPA系统根据所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息,在第一文本信息和所述第一文本信息的各个文本片段对应的位置信息中,确定所述区域范围内的文本标注结果。According to the position information of the target sub-picture corresponding to the to-be-marked sub-file to which the area range belongs, the RPA system sets the first text information and the position information corresponding to each text segment of the first text information , determine the text labeling results within the range of the region.
  3. 根据权利要求1或2所述的方法,其特征在于,所述RPA系统根据所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息,在第一文本信息和所述第一文本信息的各个文本片段对应的位置信息中,确定所述区域范围内的文本标注结果,包括:The method according to claim 1 or 2, wherein the RPA system, according to the position information of the region range relative to the target sub-picture corresponding to the sub-file to be marked to which the region range belongs, in the first text information In the position information corresponding to each text fragment of the first text information, the text labeling result within the range of the region is determined, including:
    所述RPA系统根据所述区域范围所属的待标注子文件在所述待标注文件中的位置信息,确定所述区域范围所属的待标注子文件在所述第一文本信息中对应的第二文本信息;The RPA system determines the second text corresponding to the sub-file to be marked to which the area range belongs in the first text information according to the position information of the sub-file to be marked to which the area range belongs in the file to be marked information;
    所述RPA系统根据所述第二文本信息与所述第一文本信息的对应关系,在所述第一文本信息各个文本片段对应的位置信息中确定所述第二文本信息的各个文本片段对应的位置信息;According to the correspondence between the second text information and the first text information, the RPA system determines the location information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information location information;
    所述RPA系统根据所述区域范围相对于所述区域范围所属的待标注子文件的位置信息,确定所述区域范围内在所述第二文本信息中的第三文本信息;The RPA system determines the third text information in the second text information within the area range according to the position information of the area range relative to the subfile to be marked to which the area range belongs;
    所述RPA系统根据所述第三文本信息与所述第二文本信息的对应关系,在所述第二文本信息的各个文本片段对应的位置信息中确定所述第三文本信息的各个文本片段对应的位置信息;The RPA system, according to the corresponding relationship between the third text information and the second text information, determines in the position information corresponding to each text segment of the second text information location information;
    所述RPA系统将所述第三文本信息和第三文本信息的各个文本片段对应的位置信息,作为所述区域范围内的文本标注结果。The RPA system uses the third text information and position information corresponding to each text segment of the third text information as a text labeling result within the range of the area.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述RPA系统响应于鼠标事件,确定所述目标图片中的文本标注的区域范围,包括:The method according to any one of claims 1 to 3, wherein the RPA system determines the area range of the text label in the target picture in response to the mouse event, including:
    所述RPA系统监听所述目标图片的鼠标事件;其中,鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件;The RPA system monitors the mouse event of the target picture; wherein, the mouse event includes successively: a mouse click event, a mouse movement event and a mouse lift event;
    所述RPA系统根据所述鼠标点击事件,确定所述区域范围的第一坐标;The RPA system determines the first coordinates of the area range according to the mouse click event;
    所述RPA系统根据所述鼠标移动事件和所述鼠标抬起事件,确定所述区域范围的第二坐标;The RPA system determines the second coordinates of the area range according to the mouse movement event and the mouse lift event;
    所述RPA系统根据所述第一坐标和所述第二坐标,确定所述区域范围的高度值和宽度值;The RPA system determines the height value and width value of the area range according to the first coordinate and the second coordinate;
    所述RPA系统将所述第一坐标、所述第二坐标以及所述区域范围的高度值和宽度值的围合区域,作为所述目标图片中的文本标注的区域范围。The RPA system uses the first coordinates, the second coordinates, and the enclosed area of the height value and width value of the area range as the area range marked by the text in the target picture.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述RPA系统响应于所述文件标注请求,生成与所述文件标注请求对应的响应结果,包括:The method according to any one of claims 1 to 4, wherein the RPA system generates a response result corresponding to the file labeling request in response to the file labeling request, including:
    所述RPA系统根据所述文件标注请求,获取与所述文件标注请求对应的待标注文件;The RPA system obtains the file to be marked corresponding to the file mark request according to the file mark request;
    所述RPA系统将所述待标注文件进行图片转换,获取与所述待标注文件对应的转换图片;The RPA system performs image conversion on the file to be marked, and obtains a converted picture corresponding to the file to be marked;
    所述RPA系统基于光学字符识别OCR对所述转换图片进行字符识别,以获取所述待标注文件对应的第一文本信息以及所述第一文本信息的各个文本片段对应的位置信息;The RPA system performs character recognition on the converted picture based on optical character recognition OCR, so as to obtain the first text information corresponding to the document to be marked and the position information corresponding to each text segment of the first text information;
    所述RPA系统将所述待标注文件、所述待标注文件对应的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,作为与所述文件标注请求对应的响应结果。The RPA system uses the file to be marked, the first text information corresponding to the file to be marked, and the position information corresponding to each text segment of the first text information as a response result corresponding to the file marking request.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述RPA系统根据所述响应结果,绘制所述待标注文件对应的目标图片,包括:The method according to any one of claims 1 to 5, wherein the RPA system draws the target picture corresponding to the file to be marked according to the response result, including:
    所述RPA系统获取响应结果中的所述待标注文件的多个待标注子文件;The RPA system obtains a plurality of subfiles to be marked of the document to be marked in the response result;
    所述RPA系统针对每个所述待标注子文件,创建与所述待标注子文件对应的绘图对象;The RPA system creates a drawing object corresponding to the sub-file to be marked for each sub-file to be marked;
    所述RPA系统根据所述待标注子文件在所述待标注文件中的位置信息,确定所述待标注子文件在第一文本信息中对应的文本信息;The RPA system determines the text information corresponding to the subfile to be marked in the first text information according to the position information of the subfile to be marked in the file to be marked;
    根据所述文本信息与所述第一文本信息的对应关系,在所述第一文本信息的各个文本片段对应的位置信息中确定所述文本信息的各个文本片段对应的位置信息;According to the corresponding relationship between the text information and the first text information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment of the first text information;
    所述RPA系统根据所述绘图对象的尺寸信息,和所述待标注子文件对应的文本信息和所述文本信息的各个文本片段对应的位置信息,绘制所述待标注子文件对应的目标子图片;The RPA system draws the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and the position information corresponding to each text segment of the text information ;
    所述RPA系统将所述多个待标注子文件对应的目标子图片进行图片拼接,以得到目标图片。The RPA system stitches the target sub-pictures corresponding to the plurality of sub-files to be marked to obtain the target picture.
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-6, wherein the method further comprises:
    所述RPA系统将所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息,以及所述区域范围内的第三文本信息和所述第三文本信息的各个文本片段对应的位置信息进行标签标注和保存,以作为模型的训练数据。The RPA system uses the position information of the region range relative to the target sub-picture corresponding to the sub-file to be marked to which the region range belongs, and the third text information in the region range and each of the third text information The location information corresponding to the text segment is tagged and saved as the training data of the model.
  8. 一种基于人工智能AI和机器人流程自动化RPA的文件标注装置,其特征在于,所述文件标注装置应用与RPA系统,包括:A file labeling device based on artificial intelligence AI and robotic process automation RPA, characterized in that the file labeling device is applied to the RPA system, including:
    获取模块,用于获取文件标注请求;其中,所述文件标注请求用于对待标注文件进行标注;An acquisition module, configured to acquire a file annotation request; wherein, the file annotation request is used to annotate the file to be annotated;
    生成模块,用于响应于所述文件标注请求,生成与所述文件标注请求对应的响应结果;A generation module, configured to generate a response result corresponding to the file annotation request in response to the file annotation request;
    绘制模块,用于根据所述响应结果,绘制所述待标注文件对应的目标图片;A drawing module, configured to draw a target picture corresponding to the file to be marked according to the response result;
    第一确定模块,用于响应于鼠标事件,确定所述目标图片中的文本标注的区域范围;The first determining module is used to determine the area range of the text label in the target picture in response to the mouse event;
    第二确定模块,用于根据对所述待标注文件进行光学字符识别OCR所获取的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,确定所述区域范围内的文本标注结果。The second determination module is configured to determine the text within the region according to the first text information obtained by performing optical character recognition (OCR) on the document to be marked and the position information corresponding to each text segment of the first text information Label the results.
  9. 根据权利要求8所述的装置,其特征在于,所述第二确定模块,用于:The device according to claim 8, wherein the second determining module is configured to:
    根据所述区域范围的顶点坐标信息以及所述待标注文件中的待标注子文件的高度信息,确定所述区域范围所属的待标注子文件;According to the vertex coordinate information of the area range and the height information of the sub-file to be marked in the file to be marked, determine the sub-file to be marked to which the area range belongs;
    确定所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息;Determine the location information of the region range relative to the target sub-picture corresponding to the to-be-marked sub-file to which the region range belongs;
    根据所述区域范围相对于所述区域范围所属的待标注子文件对应的目标子图片的位置信息,在第一文本信息和所述第一文本信息的各个文本片段对应的位置信息中,确定所述区域范围内的文本标注结果。According to the position information of the area range relative to the target sub-picture corresponding to the sub-file to be marked to which the area range belongs, in the first text information and the position information corresponding to each text segment of the first text information, determine the Text labeling results within the above range.
  10. 根据权利要求8或9所述的装置,其特征在于,所述第二确定模块,还用于:The device according to claim 8 or 9, wherein the second determination module is further configured to:
    根据所述区域范围所属的待标注子文件在所述待标注文件中的位置信息,确定所述区域范围所属的待标注子文件在所述第一文本信息中对应的第二文本信息;Determining second text information corresponding to the sub-file to be marked to which the area range belongs in the first text information according to the position information of the sub-file to be marked to which the area range belongs in the file to be marked;
    根据所述第二文本信息与所述第一文本信息的对应关系,在所述第一文本信息的各个文本片段对应的位置信息中确定所述第二文本信息各个文本片段对应的位置信息;According to the corresponding relationship between the second text information and the first text information, determine the position information corresponding to each text segment of the second text information in the position information corresponding to each text segment of the first text information;
    根据所述区域范围相对于所述区域范围所属的待标注子文件的位置信息,确定所述区域范围内在所述第二文本信息中的第三文本信息;determining third text information in the second text information within the area range according to the position information of the area range relative to the sub-file to be marked to which the area range belongs;
    根据所述第三文本信息与所述第二文本信息的对应关系,在所述第二文本信息的各个文本片段对应的位置信息中确定所述第三文本信息的各个文本片段对应的位置信息;According to the corresponding relationship between the third text information and the second text information, determine the position information corresponding to each text segment of the third text information in the position information corresponding to each text segment of the second text information;
    将所述第三文本信息和第三文本信息的各个文本片段对应的位置信息,作为所述区域范围内的文本标注结果。The third text information and the position information corresponding to each text segment of the third text information are used as a text labeling result within the range of the area.
  11. 根据权利要求8至10中任一项所述的装置,其特征在于,所述第一确定模块,用于:The device according to any one of claims 8 to 10, wherein the first determining module is configured to:
    监听所述目标图片的鼠标事件;其中,鼠标事件依次包括:鼠标点击事件、鼠标移动事件和鼠标抬起事件;Listening to the mouse event of the target picture; wherein, the mouse event includes in turn: a mouse click event, a mouse movement event and a mouse lift event;
    根据所述鼠标点击事件,确定所述区域范围的第一坐标;determining the first coordinates of the area range according to the mouse click event;
    根据所述鼠标移动事件和所述鼠标抬起事件,确定所述区域范围的第二坐标;determining the second coordinates of the area range according to the mouse movement event and the mouse lift event;
    根据所述第一坐标和所述第二坐标,确定所述区域范围的高度值和宽度值;determining a height value and a width value of the area range according to the first coordinates and the second coordinates;
    将所述第一坐标、所述第二坐标以及所述区域范围的高度值和宽度值的围合区域,作为所述目标图片中的文本标注的区域范围。An area enclosed by the first coordinates, the second coordinates, and the height value and width value of the area range is used as the area range marked by the text in the target picture.
  12. 根据权利要求8至11中任一项所述的装置,其特征在于,所述生成模块,用于:The device according to any one of claims 8 to 11, wherein the generating module is configured to:
    根据所述文件标注请求,获取与所述文件标注请求对应的待标注文件;Acquiring, according to the file labeling request, a file to be marked corresponding to the file labeling request;
    将所述待标注文件进行图片转换,获取与所述待标注文件对应的转换图片;performing image conversion on the file to be marked, and obtaining a converted picture corresponding to the file to be marked;
    基于光学字符识别OCR对所述转换图片进行字符识别,以获取所述待标注文件对应的第一文本信息以及所述第一文本信息的各个文本片段对应的位置信息;performing character recognition on the converted image based on optical character recognition (OCR), to obtain the first text information corresponding to the file to be marked and the position information corresponding to each text segment of the first text information;
    将所述待标注文件、所述待标注文件对应的第一文本信息和所述第一文本信息的各个文本片段对应的位置信息,作为与所述文件标注请求对应的响应结果。The file to be marked, the first text information corresponding to the file to be marked, and the position information corresponding to each text segment of the first text information are used as a response result corresponding to the file mark request.
  13. 根据权利要求8至12中任一项所述的装置,其特征在于,所述绘制模块,用于:The device according to any one of claims 8 to 12, wherein the drawing module is configured to:
    获取响应结果中的所述待标注文件的多个待标注子文件;Obtain multiple sub-files to be marked of the file to be marked in the response result;
    针对所述待标注文件中的每个待标注子文件,创建与所述待标注子文件对应的绘图对象;For each sub-file to be marked in the file to be marked, create a drawing object corresponding to the sub-file to be marked;
    根据所述待标注子文件在所述待标注文件中的位置信息,确定所述待标注子文件在第一文本信息中对应的文本信息;According to the position information of the subfile to be marked in the file to be marked, determine the text information corresponding to the subfile to be marked in the first text information;
    根据所述文本信息与所述第一文本信息的对应关系,在所述第一文本信息中各个文本片段对应的位置信息中确定所述文本信息的各个文本片段对应的位置信息;According to the corresponding relationship between the text information and the first text information, determine the position information corresponding to each text segment of the text information in the position information corresponding to each text segment in the first text information;
    根据所述绘图对象的尺寸信息,和所述待标注子文件对应的文本信息和所述文本信息的各个文本片段对应的位置信息,绘制所述待标注子文件对应的目标子图片;Draw the target sub-picture corresponding to the sub-file to be marked according to the size information of the drawing object, the text information corresponding to the sub-file to be marked and the position information corresponding to each text segment of the text information;
    将所述多个待标注子文件对应的目标子图片进行图片拼接,以得到目标图片。Image splicing is performed on the target sub-pictures corresponding to the plurality of sub-files to be marked to obtain the target picture.
  14. 根据权利要求8-13中任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 8-13, wherein the device further comprises:
    处理模块,用于将所述区域范围相对于所述区域范围所属的待标注子文件的位置信息,以及所述区域范围内的第三文本信息和所述第三文本信息的各个文本片段对应的位置信息进行标签标注和保存,以作为模型的训练数据。A processing module, configured to combine the location information of the area range with respect to the sub-file to be marked to which the area range belongs, and the third text information in the area range and the corresponding text fragments of the third text information The location information is tagged and saved as the training data for the model.
  15. 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1-7中任一项所述的方法。An electronic device, characterized in that it includes a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, any of claims 1-7 can be realized. one of the methods described.
  16. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-7中任一项所述的方法。A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program implements the method according to any one of claims 1-7 when executed by a processor.
  17. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-7中任一项所述的方法。A computer program product, characterized by comprising a computer program, the computer program implementing the method according to any one of claims 1-7 when executed by a processor.
PCT/CN2021/132175 2021-09-01 2021-11-22 Ai and rpa-based file annotation method and apparatus, device, and medium WO2023029230A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111021971.0 2021-09-01
CN202111021971.0A CN113836090A (en) 2021-09-01 2021-09-01 File labeling method, device, equipment and medium based on AI and RPA

Publications (1)

Publication Number Publication Date
WO2023029230A1 true WO2023029230A1 (en) 2023-03-09

Family

ID=78961955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/132175 WO2023029230A1 (en) 2021-09-01 2021-11-22 Ai and rpa-based file annotation method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN113836090A (en)
WO (1) WO2023029230A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097713A1 (en) * 2018-09-24 2020-03-26 International Business Machines Corporation Method and System for Accurately Detecting, Extracting and Representing Redacted Text Blocks in a Document
CN110929714A (en) * 2019-11-22 2020-03-27 北京航空航天大学 Information extraction method of intensive text pictures based on deep learning
CN112101357A (en) * 2020-11-03 2020-12-18 杭州实在智能科技有限公司 RPA robot intelligent element positioning and picking method and system
CN112381087A (en) * 2020-08-26 2021-02-19 北京来也网络科技有限公司 Image recognition method, apparatus, computer device and medium combining RPA and AI

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144078B (en) * 2019-12-13 2023-09-01 平安银行股份有限公司 Method, device, server and storage medium for determining positions to be marked in PDF (portable document format) file
CN112241629A (en) * 2019-12-23 2021-01-19 北京来也网络科技有限公司 Pinyin annotation text generation method and device combining RPA and AI
CN111310693B (en) * 2020-02-26 2023-08-29 腾讯科技(深圳)有限公司 Intelligent labeling method, device and storage medium for text in image
CN111753717B (en) * 2020-06-23 2023-07-28 北京百度网讯科技有限公司 Method, device, equipment and medium for extracting structured information of text
CN112329434B (en) * 2020-11-26 2024-04-12 北京百度网讯科技有限公司 Text information identification method, device, electronic equipment and storage medium
CN112764642B (en) * 2020-12-31 2022-11-29 达而观数据(成都)有限公司 Canvas technology-based universal document labeling method and system
CN112906683A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Text labeling method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200097713A1 (en) * 2018-09-24 2020-03-26 International Business Machines Corporation Method and System for Accurately Detecting, Extracting and Representing Redacted Text Blocks in a Document
CN110929714A (en) * 2019-11-22 2020-03-27 北京航空航天大学 Information extraction method of intensive text pictures based on deep learning
CN112381087A (en) * 2020-08-26 2021-02-19 北京来也网络科技有限公司 Image recognition method, apparatus, computer device and medium combining RPA and AI
CN112101357A (en) * 2020-11-03 2020-12-18 杭州实在智能科技有限公司 RPA robot intelligent element positioning and picking method and system

Also Published As

Publication number Publication date
CN113836090A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
EP3828719A2 (en) Method and apparatus for generating model for representing heterogeneous graph node, electronic device, storage medium, and computer program product
US10521500B2 (en) Image processing device and image processing method for creating a PDF file including stroke data in a text format
EP3859562A2 (en) Method, apparatus, electronic device, storage medium and computer program product for generating information
EP3843031A2 (en) Face super-resolution realization method and apparatus, electronic device and storage medium
EP3882784A1 (en) Event argument extraction method and apparatus and electronic device
US20150128017A1 (en) Enabling interactive screenshots within collaborative applications
US11727200B2 (en) Annotation tool generation method, annotation method, electronic device and storage medium
US8949729B2 (en) Enhanced copy and paste between applications
WO2011075295A1 (en) Method for tracking annotations with associated actions
KR20210038446A (en) Method and apparatus for controlling electronic device based on gesture
CN112541359B (en) Document content identification method, device, electronic equipment and medium
US9177405B2 (en) Image processing apparatus, computer program product, and image processing system
US11810333B2 (en) Method and apparatus for generating image of webpage content
CN112506854A (en) Method, device, equipment and medium for storing page template file and generating page
CN117057318A (en) Domain model generation method, device, equipment and storage medium
JP2004013318A (en) Method, apparatus, and program for information processing
WO2023029230A1 (en) Ai and rpa-based file annotation method and apparatus, device, and medium
CN111026916B (en) Text description conversion method and device, electronic equipment and storage medium
US11977850B2 (en) Method for dialogue processing, electronic device and storage medium
US20210336964A1 (en) Method for identifying user, storage medium, and electronic device
CN114528509A (en) Page display processing method and device, electronic equipment and storage medium
CN111723177A (en) Modeling method and device of information extraction model and electronic equipment
US20230119741A1 (en) Picture annotation method, apparatus, electronic device, and storage medium
WO2023000867A1 (en) Page configuration method and apparatus
JP7315639B2 (en) Paper data digitization method and device, electronic device, storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21955758

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE