CN111178365A - Picture character recognition method and device, electronic equipment and storage medium - Google Patents

Picture character recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111178365A
CN111178365A CN201911422228.9A CN201911422228A CN111178365A CN 111178365 A CN111178365 A CN 111178365A CN 201911422228 A CN201911422228 A CN 201911422228A CN 111178365 A CN111178365 A CN 111178365A
Authority
CN
China
Prior art keywords
picture
identification area
template
area
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911422228.9A
Other languages
Chinese (zh)
Inventor
丁晓雪
史忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuba Co Ltd
Original Assignee
Wuba Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuba Co Ltd filed Critical Wuba Co Ltd
Priority to CN201911422228.9A priority Critical patent/CN111178365A/en
Publication of CN111178365A publication Critical patent/CN111178365A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Abstract

The application provides a picture character recognition method and device, electronic equipment and a storage medium. The method comprises the following steps: after template information corresponding to the characteristics of the picture to be recognized is obtained from the preset database, the target recognition area of the picture to be recognized can be determined according to the position information of the recognition area in the template picture, and then the character information in the target recognition area can be recognized. By adopting the method, the template information comprises the position information of the identification area in the template picture, so that when the picture to be identified is identified, the target identification area in the picture to be identified is not required to be selected by a manual frame, the influence of the self state of a worker is avoided, the accuracy of picture character identification is improved, the whole identification process is not required to be manually participated, on one hand, the manpower resource can be saved, and on the other hand, the efficiency of picture character identification can be improved.

Description

Picture character recognition method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for identifying pictures and texts, an electronic device, and a storage medium.
Background
With the progress of science and technology and the development of society, information interaction becomes an indispensable part of people's daily life, and meanwhile, the information interaction of electronization becomes a trend of information interaction. For example, when handling public affairs, the text information on the identity card needs to be collected and stored in the terminal, which is convenient for statistics and archiving; for another example, business cards are exchanged during social contact, and for convenience of viewing and saving, text information in the business cards can be identified and stored in the terminal, so that relevant information can be obtained in time when needed.
The information in the picture can be divided into important information and non-important information, and the important information is usually stored. In the prior art, a manual frame selection mode is usually adopted to identify important information, specifically, for each picture to be identified, a worker selects an area containing the important information in the picture by using a mouse frame, and then performs character identification on the area selected by the frame by using an image identification algorithm. The method is low in recognition efficiency, affected by the state of the worker and unstable in accuracy.
Therefore, a method for identifying pictures and characters is needed to solve the problems of low identification efficiency and unstable accuracy of the identification method in the prior art.
Disclosure of Invention
The application provides a picture character recognition method, a picture character recognition device, an electronic device and a storage medium, which can be used for solving the technical problems that the recognition efficiency is low and the accuracy is unstable easily caused by a recognition method in the prior art.
In a first aspect, an embodiment of the present application provides a method for identifying picture characters, where the method includes:
acquiring template information corresponding to the characteristics of the picture to be identified from a preset database; the template information comprises position information of an identification area in a template picture, and the identification area is an area of a preset field in the template picture;
determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture; the target identification area is an area of the preset field in the picture to be identified;
and identifying the text information in the target identification area.
With reference to the first aspect, in an implementation manner of the first aspect, the template picture is determined by:
acquiring an original picture;
preprocessing the original picture to obtain the template picture; the pre-processing includes at least one of picture scaling processing, picture compression processing, and canvas clearing processing.
With reference to the first aspect, in an implementation manner of the first aspect, the identification area is determined by:
selecting an area where the field name of the preset field is located from the template picture frame by using a mouse to obtain a first identification area;
selecting an area where a field value of the preset field is located from the template picture frame by using a mouse to obtain a second identification area; the second identification area corresponds to the first identification area;
and taking the first identification area and the second identification area as identification areas in the template picture.
With reference to the first aspect, in an implementation manner of the first aspect, determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture includes:
determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture;
determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture;
and determining the first target identification area and the second target identification area as the target identification areas.
With reference to the first aspect, in an implementation manner of the first aspect, the template information further includes a field name and a field type corresponding to the identification area;
identifying the text information in the target identification area, including:
recognizing the character information in the target recognition area by adopting a preset image recognition algorithm;
and correcting the recognized character information according to the field name and the field type corresponding to the recognition area.
With reference to the first aspect, in an implementation manner of the first aspect, before determining the target identification area of the picture to be identified, the method further includes:
framing a plurality of reference points on the template picture;
and correcting the angle of the picture to be recognized according to the reference point.
With reference to the first aspect, in an implementation manner of the first aspect, before determining the target identification area of the picture to be identified, the method further includes:
and zooming the picture to be identified according to the size of the template picture.
In a second aspect, an embodiment of the present application provides an apparatus for recognizing a picture text, where the apparatus includes:
the acquisition unit is used for acquiring template information corresponding to the characteristics of the picture to be recognized from a preset database; the template information comprises position information of an identification area in a template picture, and the identification area is an area of a preset field in the template picture;
the processing unit is used for determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture; the target identification area is an area of the preset field in the picture to be identified;
and the identification unit is used for identifying the character information in the target identification area.
With reference to the second aspect, in an implementable manner of the second aspect, the template picture is determined by:
acquiring an original picture;
preprocessing the original picture to obtain the template picture; the pre-processing includes at least one of picture scaling processing, picture compression processing, and canvas clearing processing.
With reference to the second aspect, in an implementable manner of the second aspect, the identification area is determined by:
selecting an area where the field name of the preset field is located from the template picture frame by using a mouse to obtain a first identification area;
selecting an area where a field value of the preset field is located from the template picture frame by using a mouse to obtain a second identification area; the second identification area corresponds to the first identification area;
and taking the first identification area and the second identification area as identification areas in the template picture.
With reference to the second aspect, in an implementable manner of the second aspect, the processing unit is specifically configured to:
determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture; determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture; and determining the first target identification area and the second target identification area as the target identification areas.
With reference to the second aspect, in an implementation manner of the second aspect, the template information further includes a field name and a field type corresponding to the identification area;
the identification unit is specifically configured to:
recognizing the character information in the target recognition area by adopting a preset image recognition algorithm; and correcting the recognized character information according to the field name and the field type corresponding to the recognition area.
With reference to the second aspect, in an implementable manner of the second aspect, before determining the target recognition area of the picture to be recognized, the processing unit is further configured to:
framing a plurality of reference points on the template picture; and correcting the angle of the picture to be recognized according to the reference point.
With reference to the second aspect, in an implementable manner of the second aspect, before determining the target recognition area of the picture to be recognized, the processing unit is further configured to:
and zooming the picture to be identified according to the size of the template picture.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
a memory for storing program instructions;
and the processor is used for calling and executing the program instructions in the memory so as to realize the picture character recognition method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by at least one processor of a device for recognizing picture characters, the device for recognizing picture characters executes the method for recognizing picture characters according to the first aspect.
In the embodiment of the application, after the template information corresponding to the features of the picture to be recognized is acquired from the preset database, the target recognition area of the picture to be recognized can be determined according to the position information of the recognition area in the template picture, and then the character information in the target recognition area can be recognized. By adopting the method, the template information comprises the position information of the identification area in the template picture, so that when the picture to be identified is identified, the target identification area in the picture to be identified is not required to be selected by a manual frame, the influence of the self state of a worker is avoided, the accuracy of picture character identification is improved, the whole identification process is not required to be manually participated, on one hand, the manpower resource can be saved, and on the other hand, the efficiency of picture character identification can be improved.
Drawings
Fig. 1 is a schematic flow chart corresponding to a method for generating an applet according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an identification card image;
fig. 3 is a schematic flowchart of a method for determining an identification area according to an embodiment of the present disclosure;
fig. 4a is a schematic diagram of a first identification area in an identification card picture provided in an embodiment of the present application;
fig. 4b is a schematic diagram of a second identification area in an identification card picture provided in the embodiment of the present application;
fig. 4c is a schematic diagram of an identification area in an identification card picture provided in the embodiment of the present application;
fig. 5 is an example of position information of an identification area in a template picture in an embodiment of the present application;
fig. 6 is a schematic flowchart of a method for determining a target identification area according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an apparatus for recognizing picture characters according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Please refer to fig. 1, which schematically illustrates a flowchart corresponding to a method for generating an applet according to an embodiment of the present application. As shown in fig. 1, the method specifically comprises the following steps:
step 101, obtaining template information corresponding to the characteristics of the picture to be identified from a preset database.
And step 102, determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture.
And 103, recognizing the character information in the target recognition area.
In the above steps 101 to 103, the execution subject of the method may be a terminal device with an image recognition function, such as a notebook computer, a tablet computer, a desktop computer, or a mobile phone, and is not limited specifically.
In the embodiment of the application, after the template information corresponding to the features of the picture to be recognized is acquired from the preset database, the target recognition area of the picture to be recognized can be determined according to the position information of the recognition area in the template picture, and then the character information in the target recognition area can be recognized. By adopting the method, the template information comprises the position information of the identification area in the template picture, so that when the picture to be identified is identified, the target identification area in the picture to be identified is not required to be selected by a manual frame, the influence of the self state of a worker is avoided, the accuracy of picture character identification is improved, the whole identification process is not required to be manually participated, on one hand, the manpower resource can be saved, and on the other hand, the efficiency of picture character identification can be improved.
Specifically, in step 101, the picture to be recognized may be a picture with a fixed text format, such as an identification card, a business card, a bill, a house account book, a passport, or the like.
The feature of the picture to be recognized may be feature information for distinguishing the picture. That is to say, the feature of the picture to be recognized may be a type of the picture, or may be a keyword in the picture, and is not limited specifically.
Taking the type of the picture as an example, the picture can be classified into a certificate picture, a bill picture and other pictures. The certificate pictures can include identity cards, house notebooks, passports, campus cards, book cards and the like; the bill type picture can comprise an invoice, a cash receipt, a bank receipt, an express bill and the like; other pictures may include business cards, postcards, etc.
The method comprises the following steps of (1) taking keywords in the picture, wherein the identity card comprises keywords such as name and identity number; the name card comprises keywords such as position information, company information and the like; the invoice comprises keywords such as an invoice code and an invoice number. Therefore, the keywords contained in different types of pictures are different, and therefore, the keywords can be used as the features of the pictures to be identified.
The preset database may store a corresponding relationship between the picture characteristics and the template information.
Taking the feature of the type of the picture as an example, as shown in table 1, the feature is an example of the correspondence relationship between the type of the picture and the template information. In the certificate pictures, the identity card pictures correspond to template information 1, the family identity pictures correspond to template information 2, and the passport pictures correspond to template information 3; in the bill pictures, the invoice pictures correspond to template information 4, the cashier bill pictures correspond to template information 5, and the express bill pictures correspond to template information 6; in other pictures, the business card pictures correspond to the template information 7.
Table 1: example of correspondence between picture features and template information
Figure BDA0002352686890000041
Taking the feature of the keyword in the picture as an example, as shown in table 2, the feature is an example of the correspondence between the keyword in the picture and the template information. Wherein, when the keyword in the picture is 'identity number', the keyword corresponds to the template information 1; when the keyword in the picture is the invoice code, the keyword corresponds to the template information 4; the keywords in the picture correspond to the template information 7 when the keywords are "company information" and "job information".
Table 2: example of correspondence between keywords and template information in picture
Keywords in pictures Template information
Identity number Template information 1
Invoice code Template information 4
Company information and job information Template information 7
The template information may include various types of information, such as position information of the identification area in the template picture; for example, the field name and the field type corresponding to the identification area may also be included.
The identification area may be an area of a preset field in the template picture, and the preset field may be a preset field to be identified, for example, in the identification card picture, if the field to be identified is a name and an identification number, the name and the identification number may be regarded as the preset field of the identification card picture.
The field name corresponding to the identification area may be a field name of a preset field. The field name of the preset field, i.e. the name of the field, is, for example, a schematic diagram of an id card picture as shown in fig. 2. In the id card picture, the "name", "sex", "ethnicity", "birth", "address" and "id number" may be field names.
The field type corresponding to the identification area may be a type of a field value of a preset field. For example, as shown in fig. 2, in the id card picture, "some" corresponding to the field "name" is the field value; similarly, "male" and "female" corresponding to the "sex" field are also field values; similarly, "Han" corresponding to the field of "nationality" is also the field value; similarly, "XX year, X month and X day" corresponding to the "birth" field is also the field value; similarly, the "XX province XX city XX district XX way XX number" corresponding to the "address" field is also the field value; similarly, "110102 yyymmdd 888X" corresponding to the field "identity number" is also a field value.
The field types corresponding to different field values are different, for example, the field value of the name is of a text type, the field value of the birth is of a date type, and the field value of the identification number is of a numeric type.
In the embodiment of the application, the position information of the identification area in the template picture needs to be predetermined, and the determination of the position information of the identification area in the template picture can be divided into two steps, namely firstly, the template picture is determined from a preset database; secondly, the position information of the identification area in the template picture is determined.
Further, the process of determining the template picture from the preset database is as follows: according to the characteristics of the picture to be recognized, a template picture is selected from a preset database, the template picture is clear and flat, and the same characteristics as the picture to be recognized are completely displayed, for example, if the picture to be recognized is an identity card, the template picture is also an identity card picture. That is, the picture to be recognized corresponds to the template picture, so as to perform efficient and accurate character recognition.
The following describes in detail the creation method of the template picture in the preset database and the position information of the identification area in the template picture.
For the template picture, after the original picture is acquired, the original picture may be preprocessed, so as to obtain the template picture. The preprocessing may include, but is not limited to, picture scaling, picture compression, and canvas clearing.
Further, after the template picture is obtained, the characteristics of the template picture can be recorded. The feature of the template picture may be feature information for distinguishing the picture. That is, the feature of the template picture may be the type of the picture, or may be a keyword in the picture, which is not limited specifically.
Taking the type of the picture as an example, the picture can be classified into a certificate picture, a bill picture and other pictures. The certificate pictures can include identity cards, house notebooks, passports, campus cards, book cards and the like; the bill type picture can comprise an invoice, a cash receipt, a bank receipt, an express bill and the like; other pictures may include business cards, postcards, etc.
The method comprises the following steps of (1) taking keywords in the picture, wherein the identity card comprises keywords such as name and identity number; the name card comprises keywords such as position information, company information and the like; the invoice comprises keywords such as an invoice code and an invoice number. Therefore, the keywords contained in different types of pictures are different, and therefore, the keywords can be used as the features of the pictures to be identified.
For example, if the obtained template picture is an identity card picture, then the feature of the template picture may be recorded as an "identity card"; for another example, if the obtained template picture is a business card, the characteristics of the template picture may be recorded as "company information, job information".
For the position information of the identification area in the template picture, firstly, after the template picture is determined, the identification area needs to be determined from the template picture; then, the position of the identification area in the template picture can be determined.
In the embodiment of the present application, there are various ways to determine the identification area. In an example, as shown in fig. 3, a flowchart corresponding to a method for determining an identification area provided in an embodiment of the present application specifically includes the following steps:
step 301, selecting an area where a field name of a preset field is located from a template picture frame by using a mouse, and obtaining a first identification area.
Taking the identity card picture as an example, if the preset field includes a name and an identity number, an area where two field names of the "name" and the "identity number" are located may be first selected from the identity card picture by using a mouse, that is, the first identification area, and specifically, reference may be made to fig. 4a, which schematically illustrates the first identification area in the identity card picture provided in the embodiment of the present application.
Step 302, selecting an area where a field value of the preset field is located from the template picture by using a mouse, and obtaining a second identification area.
Wherein the second identification area corresponds to the first identification area.
Still taking the identity card picture as an example, if the preset field includes a name and an identity number, a mouse may also be used to select an area where a value of the name and a value of the identity number are located from the identity card picture, that is, a second identification area, and specifically, refer to fig. 4b, which schematically illustrates a schematic diagram of the second identification area in the identity card picture provided in the embodiment of the present application.
Step 303, using the first identification area and the second identification area as identification areas in the template picture.
In another example, a mouse can be directly used to select the area where the preset field is located from the template picture frame. Still taking the id card picture as an example, if the preset field includes a name and an id number, then, a mouse may be used to select an area where the name (including a field name and a field value) and the id number (including a field name and a field value) are located from the id card picture, that is, the area is an identification area, specifically, refer to fig. 4c, which schematically illustrates an identification area in the id card picture provided in the embodiment of the present application.
Further, after the identification area is determined, the position information of the identification area in the template picture also needs to be determined. Wherein the position information of the identification area in the template picture may be a distance between a boundary of the identification area and a boundary of the template picture. Fig. 5 shows an example of position information of an identification area in a template picture in the embodiment of the present application.
As can be seen in FIG. 5, the boundaries of the identification region include edge A1, edge B1, edge C1, and edge D1; the boundaries of the template picture include edge a2, edge B2, edge C2, and edge D2. Through image analysis, the distance L1 between edge a1 and edge a2, the distance L2 between edge B1 and edge B2, the distance L3 between edge C1 and edge C2, and the distance L4 between edge D1 and edge D2 may be determined. Further, from L1, L2, L3, and L4, the position information of the identification region in the template picture can be determined.
Before performing step 102, the picture to be recognized may be preprocessed. The preprocessing of the picture to be recognized may include, but is not limited to, picture scaling, picture compression, canvas removal, and the like, depending on the type of the means for preprocessing the template picture.
Illustratively, the image to be recognized can be zoomed according to the size of the template image, so that the size of the image to be recognized is consistent with that of the template image, and the accuracy of recognizing the characters of the image is improved.
For example, in order to facilitate matching between the picture to be recognized and the template picture for efficient text recognition, in the embodiment of the present application, a plurality of reference points need to be framed on the template picture. The reference point is used for providing a reference point so as to correspond the picture to be recognized and the template picture.
Further, several reference points may be framed on the template picture, and then the angle of the picture to be recognized may be corrected according to the reference points.
In order to improve the recognition effect, when the reference point is framed on the template picture, a point located at a common and unchangeable text segment in the template picture and the picture to be recognized may be selected as the reference point. Because the template picture and the picture to be recognized have the same characteristics, and the relative positions between the text segments corresponding to the characteristics are the same and unchanged, when the reference point is selected, in order to ensure the recognition accuracy, the template picture and the picture to be recognized need to have the positions of the text segments which are the same, and the positions of the text segments which do not change are used as the selection positions of the reference point.
In order to avoid the phenomenon of error identification, the reference point should not be selected at the position of the character segment which appears repeatedly, but at the position of the character segment which appears only once.
The more the reference points on the template picture are, the more scattered the reference points are, the better the recognition effect is, and the accuracy and the efficiency of character recognition can be ensured.
In step 102, the target identification area may be an area of a preset field in the picture to be identified.
Further, if the identification area in the template picture includes a first identification area and a second identification area, correspondingly, as shown in fig. 6, a flowchart corresponding to a method for determining a target identification area provided in an embodiment of the present application is exemplarily shown, and specifically includes the following steps:
step 601, determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture.
Wherein the position information of the first identification area in the template may be a distance between a boundary of the first identification area and a boundary of the template picture.
Furthermore, when the first target identification region is determined, the distance between the boundary of the first target identification region and the boundary of the picture to be identified can be determined according to the distance between the boundary of the first identification region and the boundary of the template picture; and then, determining the first target identification area from the picture to be identified according to the distance between the boundary of the first target identification area and the boundary of the picture to be identified.
Step 602, determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture.
Wherein, the position information of the first identification area in the template may be a distance between a boundary of the second identification area and a boundary of the template picture.
Furthermore, when the second target identification area is determined, the distance between the boundary of the second target identification area and the boundary of the picture to be identified can be determined according to the distance between the boundary of the second identification area and the boundary of the template picture; and then, determining a second target identification area from the picture to be identified according to the distance between the boundary of the second target identification area and the boundary of the picture to be identified.
Step 603, determining the first target identification area and the second target identification area as target identification areas.
The target recognition area comprises a first target recognition area and a second target recognition area.
In step 103, recognizing the text information in the target recognition area by using a preset image recognition algorithm; then, the recognized character information can be corrected according to the field name and the field type corresponding to the recognition area.
Taking identification of the identity number in the identity card image as an example, after determining the area of the identity number in the identity card image, text recognition may be performed by using a designed image recognition algorithm, for example, to recognize information of "110102 YYYYMMDD 888X". Then, whether the identified information meets the characteristics of the field name (for example, the digit is 18 digits) of the identity number can be judged according to the field name corresponding to the identity number; and judging whether the recognized information is digit type or not according to the field type corresponding to the identity number. Finally, the identified information may be corrected according to the determination result.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 7 exemplarily shows a schematic structural diagram of an apparatus for recognizing picture and text provided by an embodiment of the present application. As shown in fig. 7, the device has a function of implementing the method for identifying pictures and characters, and the function may be implemented by hardware, or by hardware executing corresponding software. The apparatus may include: an acquisition unit 701, a processing unit 702 and a recognition unit 703.
An obtaining unit 701, configured to obtain template information corresponding to features of a picture to be identified from a preset database; the template information comprises position information of an identification area in a template picture, and the identification area is an area of a preset field in the template picture;
a processing unit 702, configured to determine a target identification area of the to-be-identified picture according to position information of the identification area in the template picture; the target identification area is an area of the preset field in the picture to be identified;
the identifying unit 703 is configured to identify the text information in the target identification area.
Optionally, the template picture is determined by:
acquiring an original picture;
preprocessing the original picture to obtain the template picture; the pre-processing includes at least one of picture scaling processing, picture compression processing, and canvas clearing processing.
Optionally, the identification area is determined by:
selecting an area where the field name of the preset field is located from the template picture frame by using a mouse to obtain a first identification area;
selecting an area where a field value of the preset field is located from the template picture frame by using a mouse to obtain a second identification area; the second identification area corresponds to the first identification area;
and taking the first identification area and the second identification area as identification areas in the template picture.
Optionally, the processing unit 702 is specifically configured to:
determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture; determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture; and determining the target recognition area from the first target recognition area and the second target recognition area.
Optionally, the template information further includes a field name and a field type corresponding to the identification area;
the identifying unit 703 is specifically configured to:
recognizing the character information in the target recognition area by adopting a preset image recognition algorithm; and correcting the recognized character information according to the field name and the field type corresponding to the recognition area.
Optionally, before determining the target identification area of the picture to be identified, the processing unit 702 is further configured to:
framing a plurality of reference points on the template picture; and correcting the angle of the picture to be recognized according to the reference point.
Optionally, before determining the target identification region of the picture to be identified, the processing unit is further configured to:
and zooming the picture to be identified according to the size of the template picture.
In the embodiment of the application, after the template information corresponding to the features of the picture to be recognized is acquired from the preset database, the target recognition area of the picture to be recognized can be determined according to the position information of the recognition area in the template picture, and then the character information in the target recognition area can be recognized. By adopting the method, the template information comprises the position information of the identification area in the template picture, so that when the picture to be identified is identified, the target identification area in the picture to be identified is not required to be selected by a manual frame, the influence of the self state of a worker is avoided, the accuracy of picture character identification is improved, the whole identification process is not required to be manually participated, on one hand, the manpower resource can be saved, and on the other hand, the efficiency of picture character identification can be improved.
Fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 8, an electronic device provided in an embodiment of the present invention includes: a memory 801 for storing program instructions; the processor 802 is configured to call and execute the program instructions in the memory to implement the method for recognizing the pictures and the texts according to the above embodiments.
In this embodiment, the processor 802 and the memory 801 may be connected by a bus or other means. The processor may be a general-purpose processor, such as a central processing unit, a digital signal processor, an application specific integrated circuit, or one or more integrated circuits configured to implement embodiments of the present invention. The memory may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk.
The embodiment of the invention also provides a storage medium, wherein a computer program is stored in the storage medium, and when at least one processor of the picture character recognition device executes the computer program, the picture character recognition device executes the picture character recognition method in the embodiment.
The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, for the embodiments of the service construction apparatus and the service loading apparatus, since they are substantially similar to the embodiments of the method, the description is simple, and the relevant points can be referred to the description in the embodiments of the method.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

Claims (16)

1. A picture character recognition method is characterized by comprising the following steps:
acquiring template information corresponding to the characteristics of the picture to be identified from a preset database; the template information comprises position information of an identification area in a template picture, and the identification area is an area of a preset field in the template picture;
determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture; the target identification area is an area of the preset field in the picture to be identified;
and identifying the text information in the target identification area.
2. The method of claim 1, wherein the template picture is determined by:
acquiring an original picture;
preprocessing the original picture to obtain the template picture; the pre-processing includes at least one of picture scaling processing, picture compression processing, and canvas clearing processing.
3. The method of claim 2, wherein the identification area is determined by:
selecting an area where the field name of the preset field is located from the template picture frame by using a mouse to obtain a first identification area;
selecting an area where a field value of the preset field is located from the template picture frame by using a mouse to obtain a second identification area; the second identification area corresponds to the first identification area;
and taking the first identification area and the second identification area as identification areas in the template picture.
4. The method according to claim 3, wherein determining the target identification area of the picture to be identified according to the position information of the identification area in the template picture comprises:
determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture;
determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture;
and determining the first target identification area and the second target identification area as the target identification areas.
5. The method according to claim 1, wherein the template information further includes a field name and a field type corresponding to the identification area;
identifying the text information in the target identification area, including:
recognizing the character information in the target recognition area by adopting a preset image recognition algorithm;
and correcting the recognized character information according to the field name and the field type corresponding to the recognition area.
6. The method according to claim 1, wherein before determining the target recognition area of the picture to be recognized, the method further comprises:
framing a plurality of reference points on the template picture;
and correcting the angle of the picture to be recognized according to the reference point.
7. The method according to claim 1, wherein before determining the target recognition area of the picture to be recognized, the method further comprises:
and zooming the picture to be identified according to the size of the template picture.
8. An apparatus for recognizing characters in a picture, the apparatus comprising:
the acquisition unit is used for acquiring template information corresponding to the characteristics of the picture to be recognized from a preset database; the template information comprises position information of an identification area in a template picture, and the identification area is an area of a preset field in the template picture;
the processing unit is used for determining a target identification area of the picture to be identified according to the position information of the identification area in the template picture; the target identification area is an area of the preset field in the picture to be identified;
and the identification unit is used for identifying the character information in the target identification area.
9. The apparatus of claim 8, wherein the template picture is determined by:
acquiring an original picture;
preprocessing the original picture to obtain the template picture; the pre-processing includes at least one of picture scaling processing, picture compression processing, and canvas clearing processing.
10. The apparatus of claim 9, wherein the identification area is determined by:
selecting an area where the field name of the preset field is located from the template picture frame by using a mouse to obtain a first identification area;
selecting an area where a field value of the preset field is located from the template picture frame by using a mouse to obtain a second identification area; the second identification area corresponds to the first identification area;
and taking the first identification area and the second identification area as identification areas in the template picture.
11. The apparatus according to claim 10, wherein the processing unit is specifically configured to:
determining a first target identification area in the picture to be identified according to the position information of the first identification area in the template picture; determining a second target identification area in the picture to be identified according to the position information of the second identification area in the template picture; and determining the first target identification area and the second target identification area as the target identification areas.
12. The apparatus of claim 8, wherein the template information further comprises a field name and a field type corresponding to the identification area;
the identification unit is specifically configured to:
recognizing the character information in the target recognition area by adopting a preset image recognition algorithm; and correcting the recognized character information according to the field name and the field type corresponding to the recognition area.
13. The apparatus of claim 8, wherein the processing unit, prior to determining the target recognition area of the picture to be recognized, is further configured to:
framing a plurality of reference points on the template picture; and correcting the angle of the picture to be recognized according to the reference point.
14. The apparatus of claim 8, wherein the processing unit, prior to determining the target recognition area of the picture to be recognized, is further configured to:
and zooming the picture to be identified according to the size of the template picture.
15. An electronic device, comprising:
a memory for storing program instructions;
a processor, configured to call and execute the program instructions in the memory to implement the method for recognizing picture and text according to any one of claims 1 to 7.
16. A storage medium, characterized in that the storage medium has stored therein a computer program, which when executed by at least one processor of a device for recognizing picture letters, performs the method for recognizing picture letters according to any one of claims 1 to 7.
CN201911422228.9A 2019-12-31 2019-12-31 Picture character recognition method and device, electronic equipment and storage medium Pending CN111178365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911422228.9A CN111178365A (en) 2019-12-31 2019-12-31 Picture character recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911422228.9A CN111178365A (en) 2019-12-31 2019-12-31 Picture character recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111178365A true CN111178365A (en) 2020-05-19

Family

ID=70656036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911422228.9A Pending CN111178365A (en) 2019-12-31 2019-12-31 Picture character recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111178365A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931784A (en) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
CN112580618A (en) * 2020-10-30 2021-03-30 中电万维信息技术有限责任公司 Electronic license verification method based on OCR

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109977935A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 A kind of text recognition method and device
CN110008944A (en) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 OCR recognition methods and device, storage medium based on template matching
CN110263616A (en) * 2019-04-29 2019-09-20 五八有限公司 A kind of character recognition method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN110008944A (en) * 2019-02-20 2019-07-12 平安科技(深圳)有限公司 OCR recognition methods and device, storage medium based on template matching
CN109977935A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 A kind of text recognition method and device
CN110263616A (en) * 2019-04-29 2019-09-20 五八有限公司 A kind of character recognition method, device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931784A (en) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
WO2022057471A1 (en) * 2020-09-17 2022-03-24 深圳壹账通智能科技有限公司 Bill identification method, system, computer device, and computer-readable storage medium
CN112580618A (en) * 2020-10-30 2021-03-30 中电万维信息技术有限责任公司 Electronic license verification method based on OCR

Similar Documents

Publication Publication Date Title
CN107688772B (en) Policy information entry method and device, computer equipment and storage medium
US9626555B2 (en) Content-based document image classification
US10318804B2 (en) System and method for data extraction and searching
US10339373B1 (en) Optical character recognition utilizing hashed templates
EP4109332A1 (en) Certificate authenticity identification method and apparatus, computer-readable medium, and electronic device
US20150220778A1 (en) Smart optical input/output (i/o) extension for context-dependent workflows
CN111209827B (en) Method and system for OCR (optical character recognition) bill problem based on feature detection
EP3783524A1 (en) Authentication method and apparatus, and electronic device, computer program, and storage medium
WO2021072876A1 (en) Identification image classification method and apparatus, computer device, and readable storage medium
US11727701B2 (en) Techniques to determine document recognition errors
CN110263616A (en) A kind of character recognition method, device, electronic equipment and storage medium
CN111178365A (en) Picture character recognition method and device, electronic equipment and storage medium
CN111290684B (en) Image display method, image display device and terminal equipment
CN108090728B (en) Express information input method and system based on intelligent terminal
CN113591657A (en) OCR (optical character recognition) layout recognition method and device, electronic equipment and medium
US20150030241A1 (en) Method and system for data identification and extraction using pictorial representations in a source document
AU2017301370B2 (en) Identification of duplicate copies of a form in a document
CN110751140A (en) Character batch recognition method and device and computer equipment
CN111178346B (en) Text region positioning method, text region positioning device, text region positioning equipment and storage medium
US20200250414A1 (en) Electronic device and handwriting board system
CN117632964A (en) Item receiving processing method and device, electronic equipment and storage medium
CN117151804A (en) Invoice entering method and device, computer equipment and storage medium
CN115034876A (en) Loan information auditing method and device based on OCR (optical character recognition) technology and computer equipment
CN116597462A (en) Certificate identification method based on OCR
CN117634423A (en) Text processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination