CN112287936A - Optical character recognition test method and device, readable storage medium and terminal equipment - Google Patents

Optical character recognition test method and device, readable storage medium and terminal equipment Download PDF

Info

Publication number
CN112287936A
CN112287936A CN202011019006.5A CN202011019006A CN112287936A CN 112287936 A CN112287936 A CN 112287936A CN 202011019006 A CN202011019006 A CN 202011019006A CN 112287936 A CN112287936 A CN 112287936A
Authority
CN
China
Prior art keywords
optical character
character recognition
test data
length
recognition test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011019006.5A
Other languages
Chinese (zh)
Inventor
刘旋
黄国昌
权申文
刘远明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhiying Medical Technology Co ltd
Original Assignee
Shenzhen Zhiying Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhiying Medical Technology Co ltd filed Critical Shenzhen Zhiying Medical Technology Co ltd
Priority to CN202011019006.5A priority Critical patent/CN112287936A/en
Publication of CN112287936A publication Critical patent/CN112287936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The present application belongs to the field of computer technologies, and in particular, to an optical character recognition test method, an optical character recognition test device, a computer-readable storage medium, and a terminal device. The method comprises the following steps: generating optical character recognition test data according to a preset data template and parameter configuration information, and marking a label corresponding to the optical character recognition test data; inputting the optical character recognition test data into a preset optical character recognition engine for recognition, and acquiring a recognition result output by the optical character recognition engine; and comparing the identification result with the label, and determining the identification accuracy of the optical character identification engine according to the comparison result. Compared with the traditional testing method relying on manual operation, the method and the device reduce the testing workload, shorten the testing time, avoid errors caused by human operation, and greatly improve the testing efficiency.

Description

Optical character recognition test method and device, readable storage medium and terminal equipment
Technical Field
The present application belongs to the field of computer technologies, and in particular, to an optical character recognition test method, an optical character recognition test device, a computer-readable storage medium, and a terminal device.
Background
Optical Character Recognition (OCR) refers to a process of analyzing and recognizing an image file of text data to obtain characters and layout information, i.e., recognizing characters in an image and returning the recognized characters in a text form. At present, OCR technology is applied to various industries such as banking, insurance, finance, tax, customs, public security, frontier inspection, logistics, telecom industry and commerce management, libraries, household registration management, auditing and the like. The OCR technology reduces the labor cost and improves the working efficiency.
The recognition accuracy of an OCR engine is an important indicator of the performance of the OCR engine. The traditional OCR engine identification accuracy testing method is mainly carried out manually, and the testing method is large in workload, long in time consumption, prone to errors and low in testing efficiency.
Disclosure of Invention
In view of this, embodiments of the present application provide an OCR testing method, an OCR testing apparatus, a computer-readable storage medium, and a terminal device, so as to solve the problems of the existing OCR testing method, such as large workload, long time consumption, high error probability, and low testing efficiency.
A first aspect of an embodiment of the present application provides an OCR testing method, which may include:
generating OCR test data according to a preset data template and parameter configuration information, and marking a label corresponding to the OCR test data;
inputting the OCR test data into a preset OCR engine for recognition, and acquiring a recognition result output by the OCR engine;
and comparing the recognition result with the label, and determining the recognition accuracy of the OCR engine according to the comparison result.
Further, the generating OCR test data according to the preset data template and the parameter configuration information may include:
determining a position to be filled and a filling form in the data template according to the parameter configuration information;
generating an ID according to the parameter configuration information;
and filling the ID into the position to be filled in the data template according to the filling form to generate the OCR test data.
Further, the generating an ID according to the parameter configuration information may include:
reading the ID length and the continuous indication parameter in the parameter configuration information;
if the continuous indication parameter is a preset first numerical value, generating continuous IDs conforming to the ID lengths;
if the continuous indication parameter is a preset second numerical value, reading the ID number in the parameter configuration information; and randomly generating the ID according with the ID length according to the ID number.
Further, marking the label corresponding to the OCR test data may include:
marking the ID as a label corresponding to the OCR test data.
Further, the comparing the identification result with the tag may include:
determining a length of the tag;
extracting continuous numeric strings from the recognition result, and determining the length of each extracted numeric string;
selecting an isometric number string according to the length of each number string, wherein the isometric number string is consistent with the length of the label;
comparing each selected equal-length numeric string with the label respectively, and judging whether an equal-length numeric string equal to the label exists or not;
if the number strings with the same length as the labels exist, determining that the OCR engine is correctly identified;
and if no equal-length digital string equal to the label exists, determining that the OCR engine identifies an error.
Further, the determining the recognition accuracy of the OCR engine according to the comparison result may include:
counting a first time and a second time according to the comparison result, wherein the first time is the time for the OCR engine to identify correctly, and the second time is the time for the OCR engine to identify incorrectly;
and calculating the recognition accuracy of the OCR engine according to the first times and the second times.
A second aspect of an embodiment of the present application provides an OCR testing apparatus, which may include:
the test data generation module is used for generating OCR test data according to a preset data template and parameter configuration information and marking a label corresponding to the OCR test data;
the recognition module is used for inputting the OCR test data into a preset OCR engine for recognition and acquiring a recognition result output by the OCR engine;
and the recognition result comparison module is used for comparing the recognition result with the label and determining the recognition accuracy of the OCR engine according to the comparison result.
Further, the test data generation module may include:
the position and form determining unit is used for determining the position to be filled and the filling form in the data template according to the parameter configuration information;
an ID generating unit, configured to generate an ID according to the parameter configuration information;
and the test data generating unit is used for filling the ID into the position to be filled in the data template according to the filling form to generate the OCR test data.
Further, the ID generation unit may include:
a parameter reading subunit, configured to read the ID length and the continuous indication parameter in the parameter configuration information;
and the first generation subunit is used for generating continuous IDs which accord with the ID lengths if the continuous indication parameters are preset first numerical values.
The second generation subunit is configured to, if the continuous indication parameter is a preset second numerical value, read the number of IDs in the parameter configuration information; and randomly generating the ID according with the ID length according to the ID number.
Further, the test data generation module may further include:
and the label determining unit is used for marking the ID as a label corresponding to the OCR test data.
Further, the identification result comparing module may include:
a length determination unit for determining the length of the label;
a digit string extraction unit for extracting continuous digit strings from the recognition result and determining the length of each extracted digit string;
the equal-length numeric string selecting unit is used for selecting equal-length numeric strings according to the lengths of the numeric strings, and the equal-length numeric strings are consistent with the lengths of the labels;
the comparison unit is used for comparing each selected equal-length numeric string with the label respectively and judging whether the equal-length numeric string equal to the label exists or not;
the first determining unit is used for determining that the OCR engine correctly identifies if the equal-length digital strings equal to the labels exist;
and the second determining unit is used for determining that the OCR engine identifies errors if no equal-length digital string equal to the label exists.
Further, the identification result comparison module may further include:
a result counting unit, configured to count a first number of times and a second number of times according to the comparison result, where the first number of times is a number of times that the OCR engine recognizes correctly, and the second number of times is a number of times that the OCR engine recognizes incorrectly;
and the recognition accuracy calculation unit is used for calculating the recognition accuracy of the OCR engine according to the first times and the second times.
A third aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of any of the OCR testing methods described above.
A fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any of the OCR testing methods when executing the computer program.
A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the steps of any of the OCR testing methods described above.
Compared with the prior art, the embodiment of the application has the advantages that: generating OCR test data according to a preset data template and parameter configuration information, and marking a label corresponding to the OCR test data; inputting the OCR test data into a preset OCR engine for recognition, and acquiring a recognition result output by the OCR engine; and comparing the recognition result with the label, and determining the recognition accuracy of the OCR engine according to the comparison result. Through the embodiment of the application, the test data can be automatically generated, the OCR engine is automatically called to recognize the test data, the result comparison is automatically carried out, the recognition accuracy of the OCR engine is determined, and compared with the traditional test method depending on manual operation, the test workload is reduced, the test time is shortened, errors caused by manual operation are avoided, and the test efficiency is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic view of a scene for identifying an identification number;
FIG. 2 is a schematic view of a scene for identifying a courier order number;
FIG. 3 is a diagram illustrating a scenario for identifying a hospital report check number;
FIG. 4 is a flowchart of an embodiment of an OCR testing method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a data template;
FIG. 6 is a schematic diagram of determining a location to be filled in a data template;
FIG. 7 is a schematic illustration of an OCR test data stored in the form of an image file;
FIG. 8 is a diagram illustrating the details of an image file;
FIG. 9 is a schematic illustration of batch generated OCR test data;
FIG. 10 is a schematic illustration of recognition results output by an OCR engine;
FIG. 11 is a block diagram of an embodiment of an OCR testing apparatus according to an embodiment of the present application;
fig. 12 is a schematic block diagram of a terminal device in an embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
The embodiment of the application can be applied to the scenes of recognizing various digital Identifications (IDs) through the OCR engine, including but not limited to various specific scenes such as recognizing an identification number (as shown in fig. 1), recognizing an express bill number (as shown in fig. 2), and recognizing a hospital report check number (as shown in fig. 3).
In these application scenarios, the OCR engine can correctly recognize the ID, i.e., can determine that the recognition is correct. For example, the number "123456198206257890" with eighteen consecutive digits of the identification number is correctly recognized, but the name, sex, nationality, etc. are judged to be correctly recognized. For another example, if the inspection number "02742" is successfully identified in a report form of a certain people's hospital, it can be determined that the identification is correct.
Referring to fig. 4, an embodiment of an optical character recognition testing method in an embodiment of the present application may include:
step S401, generating OCR test data according to a preset data template and parameter configuration information, and marking a label corresponding to the OCR test data.
The data template may be set by a user according to an actual application scenario, for example, by identifying a scenario of a hospital report form check number, only a portion of the hospital report form that needs to be identified by the OCR engine may be intercepted, and information such as the check number in the report form may be covered, so as to form the data template shown in fig. 5.
The specific process of generating OCR test data may include:
step S4011, determining a position to be filled in and a filling form in the data template according to the parameter configuration information.
The parameter configuration information may include specific filling forms such as fonts and font sizes, where the fonts may include common fonts such as microsoft ja-black, song style, Times New Rome, and the font sizes may include font sizes such as 12, 14, 16, 18, 20, and 24.
The parameter configuration information may further include coordinate information such as a start coordinate of the ID on a preset X axis and a start coordinate on a preset Y axis, and the position to be filled in the data template may be determined according to the coordinate information, as shown in fig. 6, a rectangular frame in the drawing is the determined position to be filled in.
And step S4012, generating a digital identifier according to the parameter configuration information.
Specifically, the ID length and the consecutive indication parameter in the parameter configuration information may be first read, respectively. The ID length can be set according to actual conditions, for example, the ID length of the identification card is 18, and the ID length of the hospital report is 5.
And the continuous indication parameter is used for indicating whether the IDs are continuously generated, and if the continuous indication parameter is a preset first numerical value, the continuous IDs conforming to the ID lengths are generated. For example, when the ID length is 5, 100000 IDs in total can be generated continuously to the maximum extent "00000", "00001", "00002" to "99998" and "99999".
And if the continuous indication parameter is a preset second numerical value, reading the number of IDs in the parameter configuration information, and randomly generating the ID according with the ID length according to the number of IDs. The number of IDs is a positive integer, which can be set according to actual circumstances, for example, when the number of IDs is 500, 500 IDs can be randomly generated.
The first value and the second value may be set according to actual conditions, for example, the first value may be set to 1, and the first value may be set to 0, and of course, other settings may also be performed according to actual conditions, which is not specifically limited in this embodiment of the application.
And step S4013, writing the ID into the position to be filled in the data template according to the filling form, and generating the OCR test data.
The OCR test data may be stored in the form of image files, and the format of the image files may include commonly used formats such as JPG, PNG, TIFF, BMP, and the like. Preferably, the ID may be marked as a label corresponding to the OCR test data and as a name of a corresponding image file. Fig. 7 is a schematic diagram of OCR test data stored in the form of an image file, where the specific content of the image file is as shown in fig. 8, where the ID is 262715, and the image file is also named 262715 to facilitate the subsequent comparison. Generally, OCR test data may be generated in batches, which is schematically shown in fig. 9.
And S402, inputting the OCR test data into a preset OCR engine for recognition, and acquiring a recognition result output by the OCR engine.
The OCR engine may be any one of those in the prior art, and its main function is to input an image and output a recognition result in the form of text. For example, when the image shown in fig. 8 is input, the recognition result in the form of text to be output is shown in fig. 10.
And S403, comparing the recognition result with the label, and determining the recognition accuracy of the OCR engine according to the comparison result.
The specific process of identifying result alignment may include:
and step S4031, determining the length of the label.
Step S4032, extracting consecutive number strings from the recognition result, and determining the length of each extracted number string.
Step S4033, equal-length numeric strings are selected according to the length of each numeric string.
The equal-length digit strings are consistent with the length of the label.
And step S4034, comparing each selected equal-length numeric string with the label respectively.
Judging whether a digital string with the same length as the label exists or not, and if the digital string with the same length as the label exists, determining that the OCR engine is correctly identified; and if no equal-length digital string equal to the label exists, determining that the OCR engine identifies an error.
For the batch generated OCR test data, image files can be read one by one, each time an image is read, an OCR engine is called to identify the image, and after an identification result is obtained, the image can be compared with the name (namely a label) of the image to obtain a comparison result. After traversing all the OCR test data, counting a first time and a second time according to a comparison result, wherein the first time is the time for identifying the OCR engine correctly, and the second time is the time for identifying the OCR engine incorrectly. After statistics are complete, the recognition accuracy of the OCR engine may be calculated according to: the recognition accuracy is first number of times ÷ (first number of times + second number of times) × 100%.
To sum up, the embodiment of the application generates OCR test data according to a preset data template and parameter configuration information, and marks a label corresponding to the OCR test data; inputting the OCR test data into a preset OCR engine for recognition, and acquiring a recognition result output by the OCR engine; and comparing the recognition result with the label, and determining the recognition accuracy of the OCR engine according to the comparison result. Through the embodiment of the application, the test data can be automatically generated, the OCR engine is automatically called to recognize the test data, the result comparison is automatically carried out, the recognition accuracy of the OCR engine is determined, and compared with the traditional test method depending on manual operation, the test workload is reduced, the test time is shortened, errors caused by manual operation are avoided, and the test efficiency is greatly improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 11 shows a structure diagram of an embodiment of an OCR testing apparatus according to an embodiment of the present application, which corresponds to the OCR testing method according to the foregoing embodiment.
In this embodiment, an OCR testing apparatus may include:
the test data generating module 1101 is configured to generate OCR test data according to a preset data template and parameter configuration information, and mark a label corresponding to the OCR test data;
the recognition module 1102 is configured to input the OCR test data into a preset OCR engine for recognition, and obtain a recognition result output by the OCR engine;
and the recognition result comparison module 1103 is configured to compare the recognition result with the tag, and determine the recognition accuracy of the OCR engine according to the comparison result.
Further, the test data generation module may include:
the position and form determining unit is used for determining the position to be filled and the filling form in the data template according to the parameter configuration information;
an ID generating unit, configured to generate an ID according to the parameter configuration information;
and the test data generating unit is used for filling the ID into the position to be filled in the data template according to the filling form to generate the OCR test data.
Further, the ID generation unit may include:
a parameter reading subunit, configured to read the ID length and the continuous indication parameter in the parameter configuration information;
and the first generation subunit is used for generating continuous IDs which accord with the ID lengths if the continuous indication parameters are preset first numerical values.
The second generation subunit is configured to, if the continuous indication parameter is a preset second numerical value, read the number of IDs in the parameter configuration information; and randomly generating the ID according with the ID length according to the ID number.
Further, the test data generation module may further include:
and the label determining unit is used for marking the ID as a label corresponding to the OCR test data.
Further, the identification result comparing module may include:
a length determination unit for determining the length of the label;
a digit string extraction unit for extracting continuous digit strings from the recognition result and determining the length of each extracted digit string;
the equal-length numeric string selecting unit is used for selecting equal-length numeric strings according to the lengths of the numeric strings, and the equal-length numeric strings are consistent with the lengths of the labels;
the comparison unit is used for comparing each selected equal-length numeric string with the label respectively and judging whether the equal-length numeric string equal to the label exists or not;
the first determining unit is used for determining that the OCR engine correctly identifies if the equal-length digital strings equal to the labels exist;
and the second determining unit is used for determining that the OCR engine identifies errors if no equal-length digital string equal to the label exists.
Further, the identification result comparison module may further include:
a result counting unit, configured to count a first number of times and a second number of times according to the comparison result, where the first number of times is a number of times that the OCR engine recognizes correctly, and the second number of times is a number of times that the OCR engine recognizes incorrectly;
and the recognition accuracy calculation unit is used for calculating the recognition accuracy of the OCR engine according to the first times and the second times.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Fig. 12 shows a schematic block diagram of a terminal device provided in an embodiment of the present application, and only shows a part related to the embodiment of the present application for convenience of explanation.
As shown in fig. 12, the terminal device 12 of this embodiment includes: a processor 120, a memory 121, and a computer program 122 stored in the memory 121 and executable on the processor 120. The processor 120, when executing the computer program 122, implements the steps in each of the OCR testing method embodiments described above, such as the steps S401 to S403 shown in fig. 4. Alternatively, the processor 120, when executing the computer program 122, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 1101 to 1103 shown in fig. 11.
Illustratively, the computer program 122 may be partitioned into one or more modules/units that are stored in the memory 121 and executed by the processor 120 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 122 in the terminal device 12.
The terminal device 12 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. Those skilled in the art will appreciate that fig. 12 is merely an example of a terminal device 12 and does not constitute a limitation of terminal device 12 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., terminal device 12 may also include input-output devices, network access devices, buses, etc.
The Processor 120 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 121 may be an internal storage unit of the terminal device 12, such as a hard disk or a memory of the terminal device 12. The memory 121 may also be an external storage device of the terminal device 12, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 12. Further, the memory 121 may also include both an internal storage unit and an external storage device of the terminal device 12. The memory 121 is used to store the computer program and other programs and data required by the terminal device 12. The memory 121 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. An optical character recognition test method, comprising:
generating optical character recognition test data according to a preset data template and parameter configuration information, and marking a label corresponding to the optical character recognition test data;
inputting the optical character recognition test data into a preset optical character recognition engine for recognition, and acquiring a recognition result output by the optical character recognition engine;
and comparing the identification result with the label, and determining the identification accuracy of the optical character identification engine according to the comparison result.
2. The method for optical character recognition testing according to claim 1, wherein the generating optical character recognition testing data according to the preset data template and the parameter configuration information comprises:
determining a position to be filled and a filling form in the data template according to the parameter configuration information;
generating a digital identifier according to the parameter configuration information;
and filling the digital identification into the position to be filled in the data template according to the filling form to generate the optical character recognition test data.
3. The optical character recognition test method of claim 2, wherein the generating a digital signature from the parameter configuration information comprises:
reading the digital identification length and the continuous indication parameter in the parameter configuration information;
and if the continuous indication parameter is a preset first numerical value, generating a continuous digital mark according with the length of the digital mark.
4. The optical character recognition test method of claim 3, further comprising:
if the continuous indication parameter is a preset second numerical value, reading the number of the digital identifications in the parameter configuration information;
and randomly generating the digital identifier according with the length of the digital identifier according to the number of the digital identifiers.
5. The optical character recognition test method of claim 2, wherein marking the label corresponding to the optical character recognition test data comprises:
and marking the digital identification as a label corresponding to the optical character recognition test data.
6. The optical character recognition test method of claim 1, wherein the comparing the recognition result with the tag comprises:
determining a length of the tag;
extracting continuous numeric strings from the recognition result, and determining the length of each extracted numeric string;
selecting an isometric number string according to the length of each number string, wherein the isometric number string is consistent with the length of the label;
comparing each selected equal-length numeric string with the label respectively, and judging whether an equal-length numeric string equal to the label exists or not;
if the digital strings with the same length as the labels exist, the optical character recognition engine is determined to correctly recognize;
and if no numeric string with the same length as the label exists, determining that the optical character recognition engine has a recognition error.
7. The optical character recognition test method according to any one of claims 1 to 6, wherein the determining the recognition accuracy of the optical character recognition engine according to the comparison result comprises:
counting a first time and a second time according to the comparison result, wherein the first time is the time for the optical character recognition engine to recognize correctly, and the second time is the time for the optical character recognition engine to recognize wrongly;
and calculating the recognition accuracy of the optical character recognition engine according to the first times and the second times.
8. An optical character recognition test device, comprising:
the test data generation module is used for generating optical character recognition test data according to a preset data template and parameter configuration information and marking a label corresponding to the optical character recognition test data;
the recognition module is used for inputting the optical character recognition test data into a preset optical character recognition engine for recognition and acquiring a recognition result output by the optical character recognition engine;
and the identification result comparison module is used for comparing the identification result with the label and determining the identification accuracy of the optical character identification engine according to the comparison result.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the optical character recognition test method according to any one of claims 1 to 7.
10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the optical character recognition test method according to any one of claims 1 to 7 when executing the computer program.
CN202011019006.5A 2020-09-24 2020-09-24 Optical character recognition test method and device, readable storage medium and terminal equipment Pending CN112287936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011019006.5A CN112287936A (en) 2020-09-24 2020-09-24 Optical character recognition test method and device, readable storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011019006.5A CN112287936A (en) 2020-09-24 2020-09-24 Optical character recognition test method and device, readable storage medium and terminal equipment

Publications (1)

Publication Number Publication Date
CN112287936A true CN112287936A (en) 2021-01-29

Family

ID=74421264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011019006.5A Pending CN112287936A (en) 2020-09-24 2020-09-24 Optical character recognition test method and device, readable storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN112287936A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113340569A (en) * 2021-05-17 2021-09-03 浪潮金融信息技术有限公司 Method, system and medium for detecting quality of high-speed photographing instrument
CN114637845A (en) * 2022-03-11 2022-06-17 上海弘玑信息技术有限公司 Model testing method, device, equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418864A (en) * 1992-09-02 1995-05-23 Motorola, Inc. Method for identifying and resolving erroneous characters output by an optical character recognition system
CN109389109A (en) * 2018-09-11 2019-02-26 厦门商集网络科技有限责任公司 The automated testing method and equipment of a kind of this recognition correct rate of OCR full text
CN109408807A (en) * 2018-09-11 2019-03-01 厦门商集网络科技有限责任公司 The automated testing method and test equipment of OCR recognition correct rate
CN109784339A (en) * 2018-12-13 2019-05-21 平安普惠企业管理有限公司 Picture recognition test method, device, computer equipment and storage medium
CN110245576A (en) * 2019-05-21 2019-09-17 深圳壹账通智能科技有限公司 Detection method, device, equipment and the storage medium of OCR recognition accuracy
CN110287963A (en) * 2019-06-11 2019-09-27 苏州玖物互通智能科技有限公司 OCR recognition method for comprehensive performance test
CN110458184A (en) * 2019-06-26 2019-11-15 平安科技(深圳)有限公司 Optical character identification householder method, device, computer equipment and storage medium
CN110516663A (en) * 2019-07-15 2019-11-29 平安普惠企业管理有限公司 Test method, device, computer equipment and the storage medium of OCR recognition accuracy
CN111144402A (en) * 2019-11-27 2020-05-12 深圳壹账通智能科技有限公司 OCR recognition accuracy calculation method, device, equipment and storage medium
WO2020098250A1 (en) * 2018-11-12 2020-05-22 平安科技(深圳)有限公司 Character recognition method, server, and computer readable storage medium
CN111476067A (en) * 2019-01-23 2020-07-31 腾讯科技(深圳)有限公司 Character recognition method and device for image, electronic equipment and readable storage medium
WO2020155763A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Ocr recognition method and electronic device thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418864A (en) * 1992-09-02 1995-05-23 Motorola, Inc. Method for identifying and resolving erroneous characters output by an optical character recognition system
CN109389109A (en) * 2018-09-11 2019-02-26 厦门商集网络科技有限责任公司 The automated testing method and equipment of a kind of this recognition correct rate of OCR full text
CN109408807A (en) * 2018-09-11 2019-03-01 厦门商集网络科技有限责任公司 The automated testing method and test equipment of OCR recognition correct rate
WO2020098250A1 (en) * 2018-11-12 2020-05-22 平安科技(深圳)有限公司 Character recognition method, server, and computer readable storage medium
CN109784339A (en) * 2018-12-13 2019-05-21 平安普惠企业管理有限公司 Picture recognition test method, device, computer equipment and storage medium
CN111476067A (en) * 2019-01-23 2020-07-31 腾讯科技(深圳)有限公司 Character recognition method and device for image, electronic equipment and readable storage medium
WO2020155763A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Ocr recognition method and electronic device thereof
CN110245576A (en) * 2019-05-21 2019-09-17 深圳壹账通智能科技有限公司 Detection method, device, equipment and the storage medium of OCR recognition accuracy
CN110287963A (en) * 2019-06-11 2019-09-27 苏州玖物互通智能科技有限公司 OCR recognition method for comprehensive performance test
CN110458184A (en) * 2019-06-26 2019-11-15 平安科技(深圳)有限公司 Optical character identification householder method, device, computer equipment and storage medium
CN110516663A (en) * 2019-07-15 2019-11-29 平安普惠企业管理有限公司 Test method, device, computer equipment and the storage medium of OCR recognition accuracy
CN111144402A (en) * 2019-11-27 2020-05-12 深圳壹账通智能科技有限公司 OCR recognition accuracy calculation method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IYAD ABU DOUSH;FAISAL ALKHATEEB;ANWAAR HAMDI GHARAIBEH: "A novel Arabic OCR post-processing using rule-based and word context techniques", INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS & RECOGNITION, vol. 21, no. 1, pages 77, XP036512603, DOI: 10.1007/s10032-018-0297-y *
胡成军;衡军;马旭勃;: "数字仪表信息的自动提取方法", 计算机与数字工程, no. 02, pages 157 - 161 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113340569A (en) * 2021-05-17 2021-09-03 浪潮金融信息技术有限公司 Method, system and medium for detecting quality of high-speed photographing instrument
CN114637845A (en) * 2022-03-11 2022-06-17 上海弘玑信息技术有限公司 Model testing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110705952A (en) Contract auditing method and device
CN109933502B (en) Electronic device, user operation record processing method and storage medium
CN112287936A (en) Optical character recognition test method and device, readable storage medium and terminal equipment
CN109784339A (en) Picture recognition test method, device, computer equipment and storage medium
CN112036145A (en) Financial statement identification method and device, computer equipment and readable storage medium
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN112308046A (en) Method, device, server and readable storage medium for positioning text region of image
WO2023038722A1 (en) Entry detection and recognition for custom forms
CN114550193A (en) Document integrity detection method and system and electronic equipment
CN112632926B (en) Bill data processing method and device, electronic equipment and storage medium
CN112528889B (en) OCR information detection and correction method, device, terminal and storage medium
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN109189372B (en) Development script generation method of insurance product and terminal equipment
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
CN115294586A (en) Invoice identification method and device, storage medium and electronic equipment
CN112149402B (en) Document matching method, device, electronic equipment and computer readable storage medium
US20210065212A1 (en) Date generation apparatus, control method, and program
CN114970490A (en) Text labeling data quality inspection method and device, electronic equipment and storage medium
CN114120347A (en) Form verification method and device, electronic equipment and storage medium
CN111813474A (en) Multi-language display method and device and electronic equipment
CN113791860A (en) Information conversion method, device and storage medium
CN111274369A (en) English word recognition method and device
CN113722208B (en) Project progress verification method and device for software test report
CN111475719B (en) Information pushing method and device based on data mining and storage medium
CN110717483B (en) Network image recognition processing method, computer readable storage medium and mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination