CN113887442A - OCR training data generation method, device, equipment and medium - Google Patents

OCR training data generation method, device, equipment and medium Download PDF

Info

Publication number
CN113887442A
CN113887442A CN202111168569.5A CN202111168569A CN113887442A CN 113887442 A CN113887442 A CN 113887442A CN 202111168569 A CN202111168569 A CN 202111168569A CN 113887442 A CN113887442 A CN 113887442A
Authority
CN
China
Prior art keywords
training data
ocr
screenshot
text information
coordinate information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111168569.5A
Other languages
Chinese (zh)
Inventor
杨磊
金清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202111168569.5A priority Critical patent/CN113887442A/en
Publication of CN113887442A publication Critical patent/CN113887442A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses an OCR training data generation method, an OCR training data generation device and a medium, wherein the OCR training data generation method comprises the following steps: acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information; and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information. According to the method and the device, the screenshot of the display page in the terminal device, the text information and the coordinate information corresponding to the text information are rapidly obtained, the training data of the OCR model are automatically generated based on the screenshot, the text information and the coordinate information, the limitation of manually generating the training data can be avoided, and the generation efficiency of the training data is effectively improved.

Description

OCR training data generation method, device, equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for generating OCR training data.
Background
Currently, mainstream OCR (Optical Character Recognition) training data is mainly generated by manual self-labeling. The training data labeled manually can meet the requirements of specific scenes, but the generation of the training data depends on manual work, and the manual work has great limitations (such as labeling speed, labeling accuracy and the like), so that the generation efficiency of the current training data is low.
Disclosure of Invention
The application mainly aims to provide an OCR training data generation method, device, equipment and medium, and aims to solve the technical problem that the current training data generation efficiency is low.
In order to achieve the above object, an embodiment of the present application provides an OCR training data generating method, where the OCR training data generating method includes:
acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information;
and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
Preferably, the OCR model comprises a character recognition model, and the step of generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information comprises:
according to the text information and the coordinate information, image segmentation is carried out on the screenshot to obtain sub-images;
and generating training data of the character recognition model according to the sub-images and the text information.
Preferably, the OCR model further includes a character detection model, and the step of generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information further includes:
and generating training data of the character detection model according to the screenshot, the text information and the coordinate information.
Preferably, the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information includes:
and cutting the image of the screenshot according to the coordinate information to obtain the cut screenshot, and executing the step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
Preferably, the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information further includes:
determining whether an image exists in the display page;
and if the image exists, removing the image in the screenshot so as to execute a step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
Preferably, the step of acquiring a screenshot of a display page in the terminal device, the text information, and the coordinate information corresponding to the text information includes:
acquiring a screenshot of a display page of a preset application program in terminal equipment;
and acquiring text information of the display page and coordinate information corresponding to the text information according to a preset control analysis module.
Preferably, after the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information, and the coordinate information, the method further includes:
determining whether the data format of the training data is the data format of the training data required by the OCR model;
and if the data format of the training data is not the data format of the training data required by the OCR model, converting the data format of the training data into the data format of the training data required by the OCR model, and training the OCR model according to the training data.
To achieve the above object, the present application also provides an OCR training data generating apparatus, including:
the acquisition module is used for acquiring a screenshot of a display page in the terminal equipment, text information and coordinate information corresponding to the text information;
and the generating module is used for generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
Further, to achieve the above object, the present application also provides an OCR training data generating apparatus, which includes a memory, a processor, and an OCR training data generating program stored on the memory and executable on the processor, and when being executed by the processor, the OCR training data generating program implements the steps of the OCR training data generating method.
Further, to achieve the above object, the present application also provides a medium, which is a computer readable storage medium, on which an OCR training data generating program is stored, and when the OCR training data generating program is executed by a processor, the OCR training data generating program implements the steps of the OCR training data generating method described above.
Further, to achieve the above object, the present application also provides a computer program product comprising a computer program which, when being executed by a processor, realizes the steps of the OCR training data generation method described above.
An embodiment of the application provides an OCR training data generation method, an OCR training data generation device, an OCR training data generation apparatus and a medium, wherein the OCR training data generation method comprises the following steps: acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information; and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information. According to the method and the device, the screenshot of the display page in the terminal device, the text information and the coordinate information corresponding to the text information are rapidly obtained, the training data of the OCR model are automatically generated based on the screenshot, the text information and the coordinate information, the limitation of manually generating the training data can be avoided, and the generation efficiency of the training data is effectively improved.
Drawings
FIG. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of an OCR training data generation method of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a first embodiment of an OCR training data generating method according to the present application;
FIG. 3 is a flowchart illustrating a second embodiment of the OCR training data generating method of the present application;
fig. 4 is a functional module diagram of a preferred embodiment of the OCR training data generating apparatus of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
An embodiment of the application provides an OCR training data generation method, an OCR training data generation device, an OCR training data generation apparatus and a medium, wherein the OCR training data generation method comprises the following steps: acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information; and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information. According to the method and the device, the screenshot of the display page in the terminal device, the text information and the coordinate information corresponding to the text information are rapidly obtained, the training data of the OCR model are automatically generated based on the screenshot, the text information and the coordinate information, the limitation of manually generating the training data can be avoided, and the generation efficiency of the training data is effectively improved.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an OCR training data generating device of a hardware operating environment according to an embodiment of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.
The OCR training data generation device in the embodiment of the application may be a PC, or may be a mobile terminal device such as a tablet computer or a portable computer.
As shown in fig. 1, the OCR training data generating apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the OCR training data generation apparatus configuration shown in figure 1 does not constitute a limitation of OCR training data generation apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and an OCR training data generating program.
In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke the OCR training data generation program stored in the memory 1005 and perform the following operations:
acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information;
and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
Preferably, the OCR model comprises a character recognition model, and the step of generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information comprises:
according to the text information and the coordinate information, image segmentation is carried out on the screenshot to obtain sub-images;
and generating training data of the character recognition model according to the sub-images and the text information.
Preferably, the OCR model further includes a character detection model, and the step of generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information further includes:
and generating training data of the character detection model according to the screenshot, the text information and the coordinate information.
Preferably, before the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information, the processor 1001 may be configured to call an OCR training data generation program stored in the storage 1005 and perform the following operations:
and cutting the image of the screenshot according to the coordinate information to obtain the cut screenshot, and executing the step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
Preferably, before the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information, the processor 1001 may be further configured to call an OCR training data generating program stored in the storage 1005, and perform the following operations:
determining whether an image exists in the display page;
and if the image exists, removing the image in the screenshot so as to execute a step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
Preferably, the step of acquiring a screenshot of a display page in the terminal device, the text information, and the coordinate information corresponding to the text information includes:
acquiring a screenshot of a display page of a preset application program in terminal equipment;
and acquiring text information of the display page and coordinate information corresponding to the text information according to a preset control analysis module.
Preferably, after the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information, the processor 1001 may be configured to call an OCR training data generation program stored in the storage 1005 and perform the following operations:
determining whether the data format of the training data is the data format of the training data required by the OCR model;
and if the data format of the training data is not the data format of the training data required by the OCR model, converting the data format of the training data into the data format of the training data required by the OCR model, and training the OCR model according to the training data.
For a better understanding of the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Referring to fig. 2, a first embodiment of the present application provides a flowchart of an OCR training data generation method. In this embodiment, the OCR training data generating method includes the steps of:
step S10, acquiring a screenshot of a display page in the terminal equipment, text information and coordinate information corresponding to the text information;
the OCR training data generation method is applied to an automatic test platform, and the OCR training data generation method is executed based on the automatic test platform, so that OCR model training data on different terminal devices can be generated quickly, and the generation efficiency of the training data is improved. The terminal device may be a PC (Personal Computer) terminal, a Web terminal, Android, iOS, and the like in this embodiment, and the OCR model in this embodiment may include a character detection model and a character recognition model, where the character detection model may detect which regions in an input image are characters or texts after being trained, and the character recognition model may convert contents corresponding to the regions detected as the characters or texts in the input image into corresponding characters or texts after being trained.
Specifically, when there is a need for generating OCR training data, a user may execute the OCR training data generating method through an automated testing platform, so as to obtain training data for performing OCR model training on different terminal devices (a PC terminal, a Web terminal, Android, and iOS) according to a function of the automated testing platform.
Further, in an interface automation test process performed by executing an OCR training data generation method based on an automation test platform, acquiring a screenshot of a display page of a currently running application program on a terminal device through a screenshot function preset in the automation test platform, and acquiring text information in the display page of the currently running application program through a control analysis module preset in the automation test platform, wherein the text information may include characters, character lengths, character widths, or texts, text lengths, text widths, and the like, the preset control analysis module includes one or more technologies such as Win32API, UIA, MSAA, selenium, uiautomation, XCUITest, and the Win32API is an application programming interface of a Microsoft 32-bit platform; the full name of MSAA is Microsoft Active access availability, namely Microsoft Active auxiliary function; UIA is called user interface automation in full, namely, automation of a user interface; selenium is a tool for Web application testing; the UiAutomator is a UI test framework which is written by Java and is released by Google when the Andriod 4.1 version is released, and is based on the Accessibility service; XCTest provides a framework for UI test containers, which need to be used in conjunction with Xcode. And simultaneously acquiring coordinate information consisting of coordinates corresponding to each character in the text information, so as to quickly generate training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information after acquiring the screenshot, the text information and the coordinate information corresponding to the text information, thereby avoiding the limitation of manually generating the training data and effectively improving the generation efficiency of the training data.
More specifically, the step of obtaining a screenshot of a display page in the terminal device, the text information, and the coordinate information corresponding to the text information includes:
step S11, acquiring a screenshot of a display page of a preset application program in the terminal equipment;
and step S12, acquiring the text information of the display page and the coordinate information corresponding to the text information according to a preset control analysis module.
When acquiring a screenshot of a display page in a terminal device, text information and coordinate information corresponding to the text information, acquiring a screenshot of a display page of an application program currently running on the terminal device by using a screenshot function corresponding to an existing interface automation test technology, for example, acquiring a screenshot of a display page of a social application program in a smart phone, a screenshot of a display page of a music application program, or acquiring a screenshot of a display interface of a document editing application program running on a PC terminal. Meanwhile, text information contained in a display page of a currently running application program on the terminal equipment and coordinate information consisting of coordinates corresponding to each character in the text information are obtained through a control analysis module preset in the automatic test platform. Specifically, the method includes acquiring text information contained in a display page of an application program currently running at a PC terminal and coordinate information consisting of coordinates corresponding to characters in the text information by technologies such as Win32API, UIA and MSAA in a control analysis module; or acquiring text information contained in a display page of the Web browser and coordinate information consisting of coordinates corresponding to each character in the text information through a selenium in the control analysis module; or acquiring text information contained in a display page of an application program currently running at the Android terminal and coordinate information consisting of coordinates corresponding to each character in the text information through the UIAutomator in the control analysis module; or acquiring the text information contained in the display page of the application program currently running at the iOS terminal and the coordinate information consisting of the coordinates corresponding to each character in the text information through the XCUITest in the control analysis module. After the screenshot, the text information and the coordinate information corresponding to the text information are obtained, the training data of the OCR model is quickly generated according to the screenshot, the text information and the coordinate information, the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
It can be understood that the control-based parsing module can only obtain coordinate information including an upper left corner coordinate of a text and text information including length and width information, and the detection model needs to be trained by using the upper, lower, left and right coordinates of the text, so that after obtaining the text information and the coordinate information of a display page in the terminal device, the coordinate information needs to be calculated according to the text length and the text width in the text information, so as to calculate the coordinate information including the upper, lower, left and right coordinates of the text, and the detection model trained by using training data including the coordinate information has higher accuracy.
And step S20, generating training data of the OCR model according to the screenshot, the text information and the coordinate information.
After the screenshot, the text information and the coordinate information of the display page of the application program currently running on the terminal device are obtained, the OCR model can comprise a character detection model and a character recognition model, so that the screenshot can be firstly subjected to image processing, specifically, a part which contains a text in the screenshot and cannot obtain the text content and the coordinate information of the text content through a control analysis module can be removed, and then training data required for training the character detection model can be quickly generated according to the screenshot after the image processing, the text information corresponding to a text area in the screenshot and the coordinate information corresponding to each character in the text information, so that the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
It can be understood that after the image processing is performed on the screenshot, for example, the screenshot is subjected to image segmentation according to the text information and the coordinate information after the screenshot includes the text but the part of the text content and the coordinate information thereof cannot be acquired through the control analysis module, the screenshot can be subjected to image segmentation to obtain the subimage, and then training data required by training the character recognition model is quickly generated according to the subimage and the text information, so that the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
Further, after the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information, and the coordinate information, the method further includes:
step S30, determining whether the data format of the training data is the data format of the training data required by the OCR model;
step S40, if the data format of the training data is not the data format of the training data required by the OCR model, converting the data format of the training data into the data format of the training data required by the OCR model, so as to train the OCR model according to the training data.
It is to be understood that, in order to enable the generated training data to be directly applied to training of the OCR model to improve the speed of model training, in this embodiment, it is further required to identify the data format of the generated training data, specifically, identify the data format of the training data, compare the data format with the data format of the OCR model training data, and determine whether the data format is the data format of the OCR model training data. If the data format is determined to be the data format of OCR model training data through comparison, the data format of the training data does not need to be converted, and the training data is stored in a file, so that the OCR model is trained according to the training data, and the success rate of OCR recognition under a specific scene is improved. On the contrary, if the data format is determined to be not the data format of the OCR model training data through comparison, the data conversion relation between the data format of the training data and the data format of the OCR model training data is obtained, the data format of the training data is converted into the data format of the OCR model training data according to the data conversion relation between the data format of the training data and the data format of the OCR model training data, the training data is stored in a file, the OCR model is trained according to the training data, and the success rate of OCR recognition under a specific scene is improved.
The embodiment provides an OCR training data generation method, an OCR training data generation device, an OCR training data generation apparatus and a medium, wherein the OCR training data generation method comprises the following steps: acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information; and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information. According to the method and the device, the screenshot of the display page in the terminal device, the text information and the coordinate information corresponding to the text information are rapidly obtained, the training data of the OCR model are automatically generated based on the screenshot, the text information and the coordinate information, the limitation of manually generating the training data can be avoided, and the generation efficiency of the training data is effectively improved.
Further, referring to fig. 3, a second embodiment of the OCR training data generating method is proposed based on the first embodiment of the OCR training data generating method, and in the second embodiment, the step of generating the training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information includes:
step A1, according to the text information and the coordinate information, image segmentation is carried out on the screenshot to obtain a sub-image;
step A2, generating training data of the character recognition model according to the sub-image and the text information.
After the screenshot, the text information, and the coordinate information of the display page of the application program currently running on the terminal device are acquired, because the training of the character recognition model is implemented by the image and the text information corresponding to the image, in this embodiment, the screenshot is divided into a group of sub-images including the text by the text information corresponding to the screenshot and the coordinate information corresponding to the text information, specifically, by the coordinates of the text in the coordinate information, the coordinates of the text in the text information, the length information of the text information, and the width information of the text in the text information, that is, the area corresponding to each section of text in the screenshot is divided into one sub-image. On the basis, each sub-image and the corresponding text information form a group of training data, so that a plurality of groups of training data for training the character recognition model can be obtained, the training data of the character recognition model can be quickly obtained, the obtaining cost of the training data is reduced, the limitation of artificially generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
Further, the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information further includes:
and step B1, generating training data of the character detection model according to the screenshot, the text information and the coordinate information.
After the screenshot, the text information and the coordinate information of the display page of the application program currently running on the terminal device are obtained, the training of the character detection model is achieved through the image and the corresponding text information and the corresponding coordinate information, so that the screenshot, the text information and the coordinate information of the display page are combined into training data of the character detection model and stored, the training data of the character detection model is rapidly obtained, meanwhile, the obtaining cost of the training data is reduced, the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
Further, the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information includes:
and step C1, image cutting is carried out on the screenshot according to the coordinate information to obtain the cut screenshot, and the step of generating training data of the OCR model according to the screenshot, the text information and the coordinate information is carried out.
It can be understood that after the screenshot of the display page in the terminal device is obtained, and before the training data of the OCR model is generated according to the screenshot, the text information and the coordinate information, because the screenshot may include other underlying text content but cannot obtain the underlying text content and the corresponding text coordinates, for example, in the screenshot of the display page of the social application program of the smart phone, in addition to the display content of the social application program, other display content of the smart phone may also be included, such as time information, electric quantity information in a digital form, and even text content such as traffic statistics information displayed in the smart phone, and the text content and the text coordinates thereof cannot be obtained or need to be additionally obtained by using other control parsing modules, therefore, in order to make the generated training data better, an area corresponding to other display content needs to be cut from the screenshot, specifically, a text region including the text information and the coordinate information that can be obtained can be determined according to the text information and the coordinate information thereof, for example, coordinates located at an upper left corner, an upper right corner, a lower left corner, and a lower right corner in the text information and the coordinate information that can be obtained can be determined, and a region surrounded by the coordinates located at the upper left corner, the upper right corner, the lower left corner, and the lower right corner is determined as the text region including the text information and the coordinate information that can be obtained. Furthermore, images except for the text area which can acquire the text information and the coordinate information in the screenshot are cut, and the residual images are obtained to form a new screenshot. Therefore, training data of the OCR model can be generated according to the screenshot, the text information and the coordinate information, the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
Further, before the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information, the method further includes:
step D1, determining whether an image exists in the display page;
and D2, if the image exists, removing the image in the screenshot, and executing the step of generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
It can be understood that after the screenshot of the page displayed in the terminal device is obtained, and before the training data of the OCR model is generated according to the screenshot, the text information and the coordinate information, when the display interface includes an image that includes a text, the text content in the image and the coordinate information corresponding to the text content cannot be obtained through a preset control parsing module, so that to make the generated training data better, a region corresponding to the image needs to be removed from the screenshot, specifically, the coordinate information corresponding to the image can be obtained, the coordinates located at the upper left corner, the upper right corner, the lower left corner and the lower right corner in the coordinate information are obtained, a region surrounded by the coordinates located at the upper left corner, the upper right corner, the lower left corner and the lower right corner is cut out or graffiti is performed on the region, and a new screenshot is formed by the image from which a partial image is cut out or the graffiti is completed. Therefore, training data of the OCR model can be generated according to the screenshot, the text information and the coordinate information, the limitation of manually generating the training data is avoided, and the generation efficiency of the training data is effectively improved.
The embodiment can respectively generate training data of the character recognition model and the character detection model in the optical character recognition OCR model according to the screenshot, the text information and the coordinate information, quickly acquire the training data of the OCR model, and simultaneously reduce the acquisition cost of the training data so as to avoid the limitation of artificially generating the training data and effectively improve the generation efficiency of the training data.
Further, the application also provides an OCR training data generation device.
Referring to fig. 4, fig. 4 is a functional module schematic diagram of a first embodiment of the OCR training data generating apparatus of the present application.
The OCR training data generating device includes:
the acquiring module 10 is used for acquiring a screenshot of a display page in the terminal device, text information and coordinate information corresponding to the text information;
and the generating module 20 is configured to generate training data of an optical character recognition OCR model according to the screenshot, the text information, and the coordinate information.
Further, the obtaining module 10 includes:
the terminal equipment comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring a screenshot of a display page of a preset application program in the terminal equipment;
and the second acquisition unit is used for acquiring the text information of the display page and the coordinate information corresponding to the text information according to a preset control analysis module.
Further, the generating module 20 includes:
the image segmentation unit is used for carrying out image segmentation on the screenshot according to the text information and the coordinate information to obtain sub-images;
and the first generating unit is used for generating training data of the character recognition model according to the sub-images and the text information.
Further, the generating module 20 further includes:
and the second generating unit is used for generating training data of the character detection model according to the screenshot, the text information and the coordinate information.
Further, the generating module 20 further includes:
and the third generating unit is used for carrying out image cutting on the screenshot according to the coordinate information to obtain the cut screenshot, so as to execute the step of generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
Further, the generating module 20 further includes:
a first determination unit configured to determine whether an image exists in the display page;
and if the image exists, removing the image in the screenshot to execute a step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
Further, the generating module 20 further includes:
a second determining unit, configured to determine whether a data format of the training data is a data format of training data required by the OCR model;
and the data conversion unit is used for converting the data format of the training data into the data format of the training data required by the OCR model if the data format of the training data is not the data format of the training data required by the OCR model, so as to train the OCR model according to the training data.
Furthermore, the present application also provides a medium, preferably a computer readable storage medium, on which an OCR training data generation program is stored, which when executed by a processor implements the steps of the above-described OCR training data generation method embodiments.
Furthermore, the present application also provides a computer program product comprising a computer program which, when being executed by a processor, realizes the steps of the above-mentioned OCR training data generation method embodiments.
In the embodiments of the OCR training data generating device, the computer readable storage medium, and the computer program product of the present application, all technical features of the embodiments of the OCR training data generating method are included, and the description and explanation contents are substantially the same as those of the embodiments of the OCR training data generating method, and are not repeated herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present application or a part contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a ROM/RAM, a magnetic disk, and an optical disk), and includes a plurality of instructions for enabling a terminal device (which may be a fixed terminal, such as an internet of things smart device including smart homes, such as a smart air conditioner, a smart lamp, a smart power supply, and a smart router, or a mobile terminal, including a smart phone, a wearable networked AR/VR device, a smart sound box, and a network device such as an auto-driven automobile) to execute the method according to the embodiments of the present application.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. An OCR training data generation method, comprising:
acquiring a screenshot of a display page in terminal equipment, text information and coordinate information corresponding to the text information;
and generating training data of an optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
2. An OCR training data generation method according to claim 1, wherein said OCR model includes a character recognition model, and said step of generating training data for an optical character recognition OCR model from said screenshot, said text information and said coordinate information includes:
according to the text information and the coordinate information, image segmentation is carried out on the screenshot to obtain sub-images;
and generating training data of the character recognition model according to the sub-images and the text information.
3. An OCR training data generation method according to claim 2, wherein said OCR model further includes a character detection model, and said step of generating training data for an optical character recognition OCR model from said screenshot, said text information and said coordinate information further includes:
and generating training data of the character detection model according to the screenshot, the text information and the coordinate information.
4. An OCR training data generation method according to claim 1, wherein said step of generating training data for an optical character recognition OCR model based on said screenshot, said text information and said coordinate information is preceded by:
and cutting the image of the screenshot according to the coordinate information to obtain the cut screenshot, and executing the step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
5. An OCR training data generation method according to claim 4, wherein said step of generating training data for an optical character recognition OCR model based on said screenshot, said text information and said coordinate information further includes:
determining whether an image exists in the display page;
and if the image exists, removing the image in the screenshot so as to execute a step of generating training data of an Optical Character Recognition (OCR) model according to the screenshot, the text information and the coordinate information.
6. The OCR training data generating method of claim 1, wherein the step of obtaining a screenshot of a display page in a terminal device, text information, and coordinate information corresponding to the text information comprises:
acquiring a screenshot of a display page of a preset application program in terminal equipment;
and acquiring text information of the display page and coordinate information corresponding to the text information according to a preset control analysis module.
7. An OCR training data generation method according to claim 1, wherein after said step of generating training data for an optical character recognition OCR model based on said screenshot, said text information and said coordinate information, further comprises:
determining whether the data format of the training data is the data format of the training data required by the OCR model;
and if the data format of the training data is not the data format of the training data required by the OCR model, converting the data format of the training data into the data format of the training data required by the OCR model, and training the OCR model according to the training data.
8. An OCR training data generation apparatus, comprising:
the acquisition module is used for acquiring a screenshot of a display page in the terminal equipment, text information and coordinate information corresponding to the text information;
and the generating module is used for generating training data of the optical character recognition OCR model according to the screenshot, the text information and the coordinate information.
9. An OCR training data generation apparatus comprising a memory, a processor and an OCR training data generation program stored on the memory and executable on the processor, the OCR training data generation program when executed by the processor implementing the steps of the OCR training data generation method of any one of claims 1-7.
10. A medium being a computer readable storage medium, characterized in that the computer readable storage medium has stored thereon an OCR training data generation program, which when executed by a processor, implements the steps of the OCR training data generation method according to any one of claims 1-7.
CN202111168569.5A 2021-09-29 2021-09-29 OCR training data generation method, device, equipment and medium Pending CN113887442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111168569.5A CN113887442A (en) 2021-09-29 2021-09-29 OCR training data generation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111168569.5A CN113887442A (en) 2021-09-29 2021-09-29 OCR training data generation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113887442A true CN113887442A (en) 2022-01-04

Family

ID=79005540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111168569.5A Pending CN113887442A (en) 2021-09-29 2021-09-29 OCR training data generation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113887442A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023143107A1 (en) * 2022-01-30 2023-08-03 北京有竹居网络技术有限公司 Character recognition method and apparatus, device, and medium
CN116781771A (en) * 2023-08-21 2023-09-19 南京粒聚智能科技有限公司 Automatic screen capturing picture analysis method of station machine by using OCR technology

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023143107A1 (en) * 2022-01-30 2023-08-03 北京有竹居网络技术有限公司 Character recognition method and apparatus, device, and medium
CN116781771A (en) * 2023-08-21 2023-09-19 南京粒聚智能科技有限公司 Automatic screen capturing picture analysis method of station machine by using OCR technology
CN116781771B (en) * 2023-08-21 2023-11-17 南京粒聚智能科技有限公司 Automatic screen capturing picture analysis method of station machine by using OCR technology

Similar Documents

Publication Publication Date Title
CN110990731B (en) Rendering method, device and equipment of static webpage and computer storage medium
US20070016657A1 (en) Multimedia data processing devices, multimedia data processing methods and multimedia data processing programs
CN113887442A (en) OCR training data generation method, device, equipment and medium
CN111240669B (en) Interface generation method and device, electronic equipment and computer storage medium
CN111475390A (en) Log collection system deployment method, device, equipment and storage medium
CN111679886A (en) Heterogeneous computing resource scheduling method, system, electronic device and storage medium
CN113971110A (en) Interface testing method, device, equipment and computer readable storage medium
CN111488186A (en) Data processing method and device, electronic equipment and computer storage medium
CN112416775A (en) Software automation testing method and device based on artificial intelligence and electronic equipment
CN111475237A (en) Menu processing method and device, electronic equipment and storage medium
CN111552463A (en) Page jump method and device, computer equipment and storage medium
CN111488731B (en) File generation method, device, computer equipment and storage medium
CN111143001A (en) Language detection method of terminal, user equipment, storage medium and device
CN107967269B (en) Method and terminal for editing page
CN112231234B (en) Cross-platform user interface automatic testing method, device, equipment and storage medium
CN113590564B (en) Data storage method, device, electronic equipment and storage medium
CN114443022A (en) Method for generating page building block and electronic equipment
CN114969603A (en) 5G message-based picture acquisition and picture generation method and system
CN106371822A (en) Universal cloud platform internationalization method and device
CN109509467B (en) Code generation method and device
CN114444447A (en) Card processing method and device
CN112328940A (en) Method and device for embedding transition page into webpage, computer equipment and storage medium
CN112950167A (en) Design service matching method, device, equipment and storage medium
CN106406888B (en) Application program interface display method and device
CN117093793B (en) Webpage 3D scene two-dimensional display method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination