CN112580499A - Text recognition method, device, equipment and storage medium - Google Patents

Text recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN112580499A
CN112580499A CN202011495134.7A CN202011495134A CN112580499A CN 112580499 A CN112580499 A CN 112580499A CN 202011495134 A CN202011495134 A CN 202011495134A CN 112580499 A CN112580499 A CN 112580499A
Authority
CN
China
Prior art keywords
text
picture
text box
candidate
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011495134.7A
Other languages
Chinese (zh)
Inventor
卜德飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202011495134.7A priority Critical patent/CN112580499A/en
Publication of CN112580499A publication Critical patent/CN112580499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Abstract

The application discloses a text recognition method, a text recognition device, text recognition equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: labeling the target type text picture to be detected to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the text picture to be detected; acquiring a template picture corresponding to a target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information in a sample text picture of the target type are located; matching the detection text box in the candidate text picture with the standard text box in the template picture; and extracting the information of the target type from the candidate text picture according to the matching processing result. The technical scheme provided by the embodiment of the application can improve the extraction speed of the text key information.

Description

Text recognition method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a text recognition method, apparatus, device, and storage medium.
Background
The text recognition technology has good application, and different types of information contained in the text can be obtained by recognizing the text at present.
In the related technology, text recognition can be performed by a template matching method, specifically, a keyword picture can be designed first, the keyword picture is used for continuously sliding on a picture to be detected, after each sliding, the similarity of pixels at all corresponding positions on the picture to be detected and the template picture is summed, matching is successful when the similarity of the pixels is the maximum, and finally text recognition is performed on a result after matching is successful.
However, this method performs matching by comparing at the pixel level, which reduces the speed of text recognition.
Disclosure of Invention
Based on this, the embodiment of the application provides a text recognition method, a text recognition device, text recognition equipment and a storage medium, which can improve the extraction speed of text key information.
In a first aspect, a method for text recognition is provided, the method including:
labeling the target type text picture to be detected to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the text picture to be detected; acquiring a template picture corresponding to a target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information in a sample text picture of the target type are located; matching the detection text box in the candidate text picture with the standard text box in the template picture; and extracting the information of the target type from the candidate text picture according to the matching processing result.
In one embodiment, the matching process of the detected text box in the candidate text picture and the standard text box in the template picture includes:
adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, wherein the adjusted template picture comprises a plurality of adjusting standard text boxes; and matching the detection text box in the candidate text picture with the adjustment standard text box in the adjustment template picture.
In one embodiment, the resizing the template picture based on the size of the candidate text picture includes:
calculating the sum of the lengths of all the detection text boxes in the candidate text picture to obtain a first length sum value; calculating the sum of the widths of all the detection text boxes in the candidate text picture to obtain a first width sum value; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value; calculating the sum of the widths of the standard text boxes in the template picture to obtain a second width sum value; and adjusting the length of the template picture based on the ratio of the first length sum value and the second length sum value, and adjusting the width of the template picture based on the ratio of the first width sum value and the second width sum value.
In one embodiment, the matching process of the detected text box in the candidate text picture and the standard text box in the template picture includes:
overlaying the template text picture on the candidate picture for sliding; after each sliding, calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture; taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio as a final relative position; and matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
In one embodiment, calculating the sum of the intersection ratios of each detection text box in the candidate text picture and each standard text box in the template picture comprises:
for each detection text box in the candidate text picture, calculating the intersection ratio of the detection text box and each standard text box in the template picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box; and calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In one embodiment, matching the detected text box in the candidate text picture and the standard text box in the template picture based on the final relative position includes:
and for each detection text box in the candidate text picture at the final relative position, calculating the intersection ratio of the detection text box and each standard text box in the template picture at the final relative position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment, the matching process of the detected text box in the candidate text picture and the standard text box in the template picture includes:
determining extreme points of the detection text boxes in the candidate text pictures based on the positions of the detection text boxes in the candidate text pictures, establishing a first coordinate system in the candidate text pictures based on the extreme points of the detection text boxes in the candidate text pictures, and acquiring the coordinates of each detection text box in the first coordinate system; determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring coordinates of each standard text box in the second coordinate system; and matching the detection text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
In one embodiment, matching the detected text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detected text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system includes:
and for each detection text box in the candidate text picture, determining the projection position of the detection text box in the second coordinate system based on the coordinates of the detection text box in the first coordinate system, calculating the intersection ratio of the detection text box and each standard text box under the condition that the detection text box is positioned at the projection position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment, after performing matching processing on the detected text box in the candidate text picture and the standard text box in the template picture, the method further includes:
detecting whether an empty standard text box exists in the template picture, wherein the empty standard text box is not matched with any detection text box in the candidate text picture; if the template picture has an empty standard text box, intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box; and labeling the missed detection area to obtain a detection text box corresponding to the empty standard text box.
In one embodiment, before obtaining the template picture corresponding to the target type, the method further includes:
obtaining a plurality of sample text pictures of a target type; labeling each sample text picture to obtain a text box set corresponding to each sample text picture, wherein the text box set comprises a plurality of candidate text boxes, and each candidate text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the corresponding sample text picture; for each information type contained in the sample text picture, acquiring a candidate text box corresponding to the information type from each text box set, and calculating a size mean value and a position mean value of the acquired candidate text boxes; generating a standard text box corresponding to each information type based on the size average value and the position average value of the candidate text box corresponding to each information type; and obtaining a template picture based on the generated standard text boxes.
In one embodiment, extracting information of the target type from the candidate text picture according to the result of the matching process includes:
acquiring a standard text box used for indicating an area where a text containing information of a target type is located from a template picture; determining a detection text box corresponding to the standard text box, and extracting characters from the determined detection text box; and identifying the extracted characters to obtain the information of the target type.
In a second aspect, an apparatus for text recognition is provided, the apparatus for text recognition including:
the marking module is used for marking the text picture to be detected of the target type to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters of which the distance between the characters is smaller than a preset distance threshold value;
the acquisition module is used for acquiring a template picture corresponding to the target type, wherein the template picture comprises a plurality of standard text boxes, and each standard text box is used for indicating an area where texts containing different types of information are located in the sample text picture of the target type;
the matching module is used for matching the detection text box in the candidate text picture with the standard text box in the template picture;
and the extraction module is used for extracting the information of the target type from the text picture to be detected according to the matching processing result.
In one embodiment, the device further comprises a first adjustment processing unit and a first matching unit;
the first adjusting and processing unit is used for adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, and the adjusted template picture comprises a plurality of adjusting standard text boxes;
the first matching unit is used for matching the detection text box in the candidate text picture with the adjustment standard text box in the adjustment template picture.
In one embodiment, the apparatus further comprises a first calculation unit and a second adjustment processing unit;
the first calculating unit is used for calculating the sum of the lengths of all the detection text boxes in the candidate text picture to obtain a first length sum value; calculating the sum of the widths of all the detection text boxes in the candidate text picture to obtain a first width sum value; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value; calculating the sum of the widths of the standard text boxes in the template picture to obtain a second width sum value;
the second adjustment processing unit is configured to perform adjustment processing on the length of the template picture based on the first length sum value and the ratio of the second length sum value, and perform adjustment processing on the width of the template picture based on the first width sum value and the ratio of the second width sum value.
In one embodiment, the device further comprises a sliding unit, a second calculating unit and a third matching unit;
the sliding unit is used for overlapping the template text picture on the candidate picture for sliding;
the second calculating unit is used for calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture after each sliding; taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio as a final relative position;
and the third matching unit is used for matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
In one embodiment, the apparatus further comprises a third calculation unit and a fourth calculation unit;
the third calculating unit is used for calculating the intersection ratio of the detection text box and each standard text box in the template picture for each detection text box in the candidate text picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box;
the fourth calculating unit is used for calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In one embodiment, the apparatus further comprises a fifth calculation unit;
the fifth calculating unit is configured to calculate, for each detected text box in the candidate text picture located at the final relative position, an intersection ratio between the detected text box and each standard text box in the template picture located at the final relative position, and use a standard text box corresponding to the calculated maximum intersection ratio as a standard text box matched with the detected text box.
In one embodiment, the device further comprises a second obtaining unit, a building unit and a fourth matching unit;
the second obtaining unit is used for determining extreme points of the detection text boxes in the candidate text picture based on the positions of the detection text boxes in the candidate text picture, establishing a first coordinate system in the candidate text picture based on the extreme points of the detection text boxes in the candidate text picture, and obtaining the coordinates of each detection text box in the first coordinate system;
the establishing unit is used for determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring the coordinates of each standard text box in the second coordinate system;
the fourth matching unit is used for matching the detection text box in the candidate text picture with the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
In one embodiment, the apparatus further comprises a sixth calculation unit;
the sixth calculating unit is configured to determine, for each detected text box in the candidate text picture, a projection position of the detected text box in the second coordinate system based on the coordinates of the detected text box in the first coordinate system, calculate an intersection ratio between the detected text box and each standard text box when the detected text box is located at the projection position, and use a standard text box corresponding to the calculated maximum intersection ratio as a standard text box matched with the detected text box.
In one embodiment, the device further comprises a detection unit, an interception unit and a first labeling unit;
the detection unit is used for detecting whether an empty standard text box exists in the template picture, and the empty standard text box is not matched with any detection text box in the candidate text picture;
the intercepting unit is used for intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box if the empty standard text box exists in the template picture;
the first labeling unit is used for labeling the missed detection area and obtaining a detection text box corresponding to the empty standard text box through labeling processing.
In one embodiment, the device further comprises a third acquisition unit, a second labeling unit, a seventh calculation unit and a generation unit;
the third acquisition unit is used for acquiring a plurality of sample text pictures of the target type;
the second labeling unit is used for performing labeling processing on each sample text picture to obtain a text box set corresponding to each sample text picture, wherein the text box set comprises a plurality of candidate text boxes, and each candidate text box comprises a plurality of characters, of which the distance between the characters is smaller than a preset distance threshold value, in the corresponding sample text picture;
the seventh calculating unit is configured to, for each information type contained in the sample text picture, obtain a candidate text box corresponding to the information type from each text box set, and calculate a size average value and a position average value of the obtained candidate text boxes;
the generating unit is used for generating a standard text box corresponding to each information type based on the size average value and the position average value of the candidate text box corresponding to each information type; and obtaining a template picture based on the generated standard text boxes.
In one embodiment, the device further comprises a fourth acquisition unit, a determination unit and an identification unit;
the fourth obtaining unit is used for obtaining a standard text box of an area where a text containing the information of the target type is located from the template picture;
the determining unit is used for determining a detection text box corresponding to the standard text box and extracting characters from the determined detection text box;
the recognition unit is used for recognizing the extracted characters to obtain the information of the target type.
In a third aspect, there is provided a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements any of the text recognition methods as described above in the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the text recognition methods of the first aspect as described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
firstly, labeling a text picture to be detected of a target type to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, each detection text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, then a template picture corresponding to the target type is obtained, the template picture comprises a plurality of standard text boxes, different standard text boxes are used for indicating areas where texts containing different types of information are located in a sample text picture of the target type, then matching processing is carried out on the detection text boxes in the candidate text picture and the standard text boxes in the template picture, and finally the information of the target type is extracted from the candidate text picture according to the matching processing result. In the matching process, the two text boxes are matched instead of pixel-level matching, so that the extraction time of the text key information can be saved, and the extraction speed of the text key information can be improved.
Drawings
Fig. 1 is a block diagram of a server according to an embodiment of the present application;
fig. 2 is a flowchart of a text recognition method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a driving license original provided in an embodiment of the present application;
fig. 4 is a schematic diagram illustrating positions of detection text boxes in a driver license candidate text picture according to an embodiment of the present application;
fig. 5 is a schematic diagram of a driver license template picture provided in an embodiment of the present application;
fig. 6 is a schematic diagram of a driving license template matching result provided in an embodiment of the present application;
fig. 7 is a flowchart illustrating a sliding matching process performed on a detected text box in a candidate text picture and a standard text box in a template picture according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating a calculation of an intersection ratio between a detected text box and each standard text box in a template picture according to an embodiment of the present application;
fig. 9 is a flowchart illustrating coordinate normalization matching processing performed on a detection text box in a candidate text picture and a standard text box in a template picture according to an embodiment of the present application;
fig. 10 is a schematic diagram of an original image of an identity card according to an embodiment of the present disclosure;
fig. 11 is a schematic diagram illustrating positions of detection text boxes in an identity card candidate text picture according to an embodiment of the present application;
fig. 12 is a schematic diagram of an identification card template picture provided in an embodiment of the present application;
fig. 13 is a schematic diagram of an identity card template matching result provided in an embodiment of the present application;
fig. 14 is a flowchart of a method for extracting target type information according to an embodiment of the present disclosure;
fig. 15 is a flowchart of a text recognition apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
At present, the text recognition technology is widely applied to the fields of license plate recognition, identification card recognition, express bill recognition and the like, and different types of information contained in the text can be obtained by recognizing the text.
In the related technology, text recognition can be performed by a template matching method, specifically, a keyword picture can be designed first, the keyword picture is slid from a designated initial position, after each sliding, the similarity of the pixels at all corresponding positions on the picture to be detected and the template picture is summed, when the similarity sum of the pixels is maximum, matching is successful, and finally text recognition is performed on the result after matching is successful. For example, if a given picture to be detected is an identity card picture, and a designed keyword picture is a name picture, the name picture needs to be continuously slid on the identity card picture, the upper left corner position can be designated as an initial position, the picture to be detected is slid downwards or rightwards according to a certain sliding step length, after each sliding, the similarity of pixels at all corresponding positions on the identity card picture and the name picture is summed, and when the similarity of the pixels is the maximum, the matching is successful, the area where the name on the identity card picture is located can be determined, and finally name information is extracted.
However, this method is to perform pixel-level comparison and matching, that is, the similarity between the picture to be detected and the pixels at all corresponding positions on the keyword picture needs to be calculated, and then the similarities of the pixels are summed up, so that the matching is successful when the similarity of the pixels is the maximum. Because the similarity needs to be calculated for all the pixel points, the number of the pixel points on the picture is very large, and the speed of text recognition is reduced.
In view of this, an embodiment of the present application provides a text recognition method, in which a target type of a to-be-detected text picture is labeled to obtain a candidate text picture, the candidate text picture includes a plurality of detection text boxes, each detection text box includes a plurality of words whose distances between the words in the to-be-detected text picture are smaller than a preset distance threshold, a template picture corresponding to the target type is then obtained, the template picture includes a plurality of standard text boxes, different standard text boxes are used to indicate areas where texts containing different types of information are located in a sample text picture of the target type, matching processing is then performed on the detection text boxes in the candidate text picture and the standard text boxes in the template picture, and finally, according to a result of the matching processing, information of the target type is extracted from the candidate text picture, because the candidate text picture comprises a plurality of detection text boxes and a plurality of standard text boxes on the template picture, the two text boxes are matched instead of pixel-level matching in the matching process, so that the extraction time of the text key information can be saved, and the extraction speed of the text key information can be improved.
The text recognition method provided in the embodiment of the present application may be applied to a server, where the server may be one server or a server cluster composed of multiple servers, and this is not specifically limited in the embodiment of the present application.
Referring to fig. 1, a block diagram of a server provided by an embodiment of the present application is shown, and as shown in fig. 1, the server may include a processor and a memory connected by a system bus. Wherein the processor of the server is configured to provide computing and control capabilities. The memory of the server comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program is executed by a processor to implement a text recognition method.
Those skilled in the art will appreciate that the architecture shown in fig. 1 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the servers to which the subject application applies, and in particular that a server may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
Referring to fig. 2, a flowchart of a text recognition method provided by an embodiment of the present application is shown, where the text recognition method may be applied in a server. As shown in fig. 2, the text recognition method may include the steps of:
step 201, the server performs labeling processing on the text picture to be tested of the target type to obtain a candidate text picture.
The text picture to be tested is a picture containing a plurality of texts, the target type is a type with a specific format, and the target type can comprise a picture type and a form type, such as a common identity card form type and a driving license form type.
The process of labeling the target type text picture to be detected may be inputting the acquired text picture to be detected into a model, calculating the trained model, outputting the coordinates and the sizes of the detected text boxes by the model, labeling the text picture to be detected based on the coordinates and the sizes, and finally obtaining the candidate text picture.
In the following, the embodiment of the present application will briefly describe the training process of the text detection model based on deep learning as described above:
specifically, when a text detection model based on deep learning is obtained, a large number of sample pictures are required to be obtained, for example, the sample pictures may be driver license pictures, all characters in all the sample pictures are marked by using a text box, and a plurality of characters, the distance between which is smaller than a preset distance threshold value, are marked in one text box, for example, the preset distance threshold value may be the length of one character, then when a plurality of characters, the distances between which are smaller than the length of one character, are marked in the same text box, finally, training of the text detection model based on deep learning is performed by using the marked sample pictures as input, coordinates and sizes of all the detected text boxes are output after each training, and finally, a text picture to be detected may be marked based on the coordinates and the sizes to obtain candidate text pictures.
After the text detection model based on deep learning is trained, the text picture to be detected is used as input, and the text picture to be detected can be labeled to obtain a candidate text picture, and the candidate text picture comprises a plurality of detection text boxes, as shown in fig. 3, fig. 3 is a driving license original drawing, as shown in fig. 4, and fig. 4 is a schematic diagram of positions of the detection text boxes in the driving license candidate text picture.
Step 202, the server obtains a template picture corresponding to the target type, wherein the template picture comprises a plurality of standard text boxes.
And each standard text box is used for indicating the area where the text containing different types of information is located in the sample text picture of the target type.
In this embodiment, the target type is the target text type form in step 201, the template picture is a picture including a plurality of standard text boxes, each standard text box indicates an area where a text of different types of information in the sample text picture is located, as shown in fig. 5, fig. 5 is a driver license template picture, the template picture includes a plurality of standard text boxes, and each text box indicates an area where different types of information such as name, gender, and native place are located.
In the following, the embodiment of the present application will briefly describe the above-mentioned process of obtaining a template picture:
wherein, the template picture comprises a plurality of generated standard text boxes. Specifically, a plurality of sample text pictures of a target type are obtained, labeling processing is performed on each sample text picture to obtain a text box set corresponding to each sample text picture, each text box set comprises a plurality of candidate text boxes, each candidate text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, for each information type contained in the sample text picture, the candidate text boxes corresponding to the information types are obtained from each text box set, the size mean value and the position mean value of the obtained candidate text boxes are calculated, a standard text box corresponding to each information type is generated based on the size mean value and the position mean value of the candidate text boxes corresponding to each information type, and a template picture is obtained based on the generated standard text boxes.
Step 203, the server performs matching processing on the detection text box in the candidate text picture and the standard text box in the template picture.
In this embodiment, matching processing is performed on the detected text box in the candidate text picture and the standard text box in the template picture. For example, if there are 32 detected text boxes in the candidate text picture and 32 standard text boxes in the template picture, then it is necessary to match the 32 detected text boxes with the 32 standard text boxes one by one, as shown in fig. 6, where fig. 6 is a result of matching the driver license template.
Because the candidate text picture comprises a plurality of detection text boxes and a plurality of standard text boxes on the template picture, the two text boxes are matched instead of pixel-level matching in the matching process, so that the extraction time of the text key information can be saved, and the extraction speed of the text key information can be improved.
And step 204, according to the matching processing result, the server extracts the information of the target type from the candidate text picture.
In this embodiment, after the detected text box is successfully matched with the standard text box, the target type information can be extracted from the text picture to be detected. For example, after the 32 detection text boxes on the driving license are successfully matched with the 32 standard text boxes one by one, if the information of the target type needs to be extracted as name information, the standard text box corresponding to the name information is found in the candidate text picture, and then the name information in the standard text box is identified to obtain the name information.
The application provides two methods for matching a detection text box in a candidate text picture with a standard text box in a template picture.
The first method is to perform sliding matching processing on a detection text box in a candidate text picture and a standard text box in a template picture.
Referring to fig. 7, a flowchart of a sliding matching process performed on a detected text box in a candidate text picture and a standard text box in a template picture according to an embodiment of the present application is shown, where the method may be applied in a server. As shown in fig. 7, the matching processing method may include the steps of:
step 701, overlaying the template text picture on the candidate picture for sliding.
In this embodiment, the candidate text picture is overlaid on the template picture for sliding. Specifically, the candidate text picture is superimposed on the template picture, the sliding position is within 10 pixels from top to bottom, from left to right, and the sliding step length is 2 pixels, that is, the candidate text picture is slid upwards or leftwards or rightwards on the template picture, each sliding range in different directions is within 10 pixels, and each sliding step length is 2 pixels.
The candidate text picture may be blurred or inclined due to factors such as a shooting angle and a shooting technology, or the size of the candidate text picture is uncertain due to different shooting distances, for example, the shooting distance is short, and the obtained candidate text picture is large. Therefore, a certain difference is generated between the obtained candidate text picture and the template picture, the candidate text picture is overlapped with the template picture to generate offset, and in order to realize accurate matching, the candidate text picture is slid upwards or leftwards or rightwards on the template picture to eliminate the offset, so that accurate matching can be realized.
And step 702, after each sliding, calculating the sum of intersection ratios of each detection text box in the candidate text picture and each standard text box in the template picture.
The method specifically comprises the following steps: for each detection text box in the candidate text picture, calculating the intersection ratio of the detection text box and each standard text box in the template picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box; and calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In this embodiment, after each sliding, the sum of candidate intersection ratios of each detected text box in the candidate text picture and each standard text box in the template picture is calculated. Where the intersection ratio is a ratio of an intersection and a union of areas of two text boxes, for example, an area of a detected text box is S1, an area of a standard text box is S2, and if the two text boxes have overlapping portions and the area of the overlapping portions is S3, the intersection ratio of the two text boxes is S3/(S1+ S2). And the candidate intersection ratio is the maximum intersection ratio of each detected text box and each standard text box in the template picture after each sliding is calculated.
As shown in fig. 8, fig. 8 is a schematic diagram of calculating the intersection ratio of the detected text box and each standard text box in the template picture. The method comprises the steps of respectively calculating the intersection and parallel ratio of a detection text box 1 and standard text boxes 4, 5 and 6 to obtain three intersection and parallel ratios, selecting a maximum intersection and parallel ratio as a candidate intersection and parallel ratio of the detection text box 1, then calculating the intersection and parallel ratios of the detection text box 2 and the standard text boxes 4, 5 and 6 to obtain three intersection and parallel ratios, selecting a maximum intersection and parallel ratio as a candidate intersection and parallel ratio of the detection text box 2, then calculating the intersection and parallel ratios of the detection text box 3 and the standard text boxes 4, 5 and 6 to obtain three intersection and parallel ratios, and selecting a maximum intersection and parallel ratio as a candidate intersection and parallel ratio of the detection text box 3. For example, if the intersection ratio between the detected text box 1 and the standard text box 4 is the maximum, the intersection ratio is taken as the candidate intersection ratio of the detected text box 1, the candidate intersection ratios of the detected text boxes 1, 2 and 3 are obtained according to the method, and then the three candidate intersection ratios are summed.
And 703, taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio relative to the template picture as a final relative position.
In this embodiment, the final relative position is the sliding position of the candidate text picture relative to the template picture corresponding to the maximum intersection ratio. Specifically, the sum of the candidate cross-over ratios in step 2032 is the sum of the maximum cross-over ratios therein, where the sum of the maximum cross-over ratios corresponds to the sliding position of one candidate text picture relative to the template picture, and since the sum of the cross-over ratios of the candidate text picture relative to the template picture at the sliding position is the largest, that is, the overlapping area of each detected text box in the candidate text picture and each standard text box in the template picture is the largest, the sliding position is taken as the final relative position.
And step 704, matching the detected text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
Specifically, for each detected text box in the candidate text picture at the final relative position, the intersection ratio of the detected text box and each standard text box in the template picture at the final relative position is calculated, and the standard text box corresponding to the maximum intersection ratio obtained through calculation is used as the standard text box matched with the detected text box.
And the second method is to carry out coordinate normalization matching processing on the detection text box in the candidate text picture and the standard text box in the template picture.
Referring to fig. 9, a flowchart of coordinate normalization matching processing performed on a detected text box in a candidate text picture and a standard text box in a template picture according to an embodiment of the present application is shown, where the method may be applied in a server. As shown in fig. 9, the matching processing method may include the steps of:
step 901, determining extreme points of the detection text boxes in the candidate text picture based on the positions of the detection text boxes in the candidate text picture, establishing a first coordinate system in the candidate text picture based on the extreme points of the detection text boxes in the candidate text picture, and acquiring coordinates of each detection text box in the first coordinate system.
As shown in fig. 10, fig. 10 is an original drawing of an identity card, and as shown in fig. 11, fig. 11 is a schematic diagram of positions of detection text boxes in an identity card candidate text picture, where the identity card candidate text picture includes 13 detection text boxes.
Specifically, the extreme point of the detected text box is the coordinate of four vertexes of the detected text box in the first coordinate system. The coordinates of the four vertices of the detected text box include a maximum value X _ max1 and a minimum value X _ min1 in the X direction, and a maximum value Y _ max1 and a minimum value Y _ min1 in the Y direction.
Wherein the first coordinate system is established in the candidate text picture. Specifically, the origin coordinate of the first coordinate system may be a vertex coordinate (x _ min1, y _ min1) of a lower left corner of the detected text box, and the detected text box is located in a first quadrant of the first coordinate system. (x1, y1) is the coordinates of the detected text box, and specifically, the coordinates of the center point of the detected text box may be selected as the coordinates of the detected text box.
And 902, determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring coordinates of each standard text box in the second coordinate system.
As shown in fig. 12, fig. 12 is an identification card template picture, which includes 13 standard text boxes.
Specifically, the extreme point of the standard text box is the coordinate of four vertexes of the standard text box in the first coordinate system. The coordinates of the four vertices of the standard textbox include the maximum value X _ max2 and the minimum value X _ min2 in the X direction, and the maximum value Y _ max2 and the minimum value Y _ min2 in the Y direction.
Wherein the second coordinate system is established in the template picture. Specifically, the origin coordinate of the first coordinate system may be the vertex coordinate (x _ min2, y _ min2) of the lower left corner of the standard text box, and the standard text box is located in the first quadrant of the second coordinate system. (x2, y2) is the coordinates of the standard text box, and specifically, the coordinates of the center point of the standard text box may be selected as the coordinates of the standard text box.
And step 903, matching the detection text box in the candidate text picture with the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system. As shown in fig. 13, fig. 13 is a schematic diagram of the matching result of the identity card template.
Specifically, for each detected text box in the candidate text picture, determining a projection position of the detected text box in the second coordinate system based on the coordinates of the detected text box in the first coordinate system, calculating the intersection ratio of the detected text box and each standard text box under the condition that the detected text box is located at the projection position, and taking the standard text box corresponding to the calculated maximum intersection ratio as the standard text box matched with the detected text box.
Specifically, the coordinates of each detected text box in the first coordinate system are normalized based on the coordinates and the extreme point, for example, (x1, y1) is the coordinates of the detected text box, and then the normalized coordinates are (x11, y11), and the specific normalization is shown in formula 1 and formula 2, where formula 1 is a calculation method for normalizing x1, and formula 2 is a calculation method for normalizing y 1. And (3) normalizing the coordinates of each standard text box in the second coordinate system and the extreme point, for example, (x2, y2) is the coordinates of the detected text box, and the normalized coordinates are (x22, y22), and the specific normalization mode is shown in formula 3 and formula 4, wherein formula 3 is a calculation mode for normalizing x2, and formula 4 is a calculation mode for normalizing y 2.
x11 ═ x1/(x _ max1-x _ min1) formula 1
y11 ═ y1/(y _ max1-y _ min1) formula 2
x22 ═ x2/(x _ max2-x _ min2) formula 3
y22 ═ y2/(y _ max2-y _ min2) formula 4
And then calculating the intersection ratio of the normalized detection text box and the standard text box, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box. Because the normalized detection text box and the standard text box are both positioned under the same coordinate system reference, the offset generated when the detection text box is superposed on the standard text box can be eliminated, and the accurate matching can be realized.
Optionally, the template picture is resized and then matched with the candidate text picture. The matching processing method may include the steps of:
and adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, wherein the adjusted template picture comprises a plurality of adjusting standard text boxes. Specifically, the sum of the lengths of the detection text boxes in the candidate text picture is calculated to obtain a first length sum value, for example, the first length sum value is 10; calculating the sum of the widths of the detection text boxes in the candidate text picture to obtain a first width sum value, wherein the first width sum value is 5 for example; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value, for example, the second length sum value is 20; and calculating the sum of the widths of the standard text boxes in the template picture to obtain a second width sum value, wherein the second width sum value is 10 for example.
And adjusting the length of the template picture based on the ratio of the first length sum value and the second length sum value. Specifically, the ratio of the first length sum value to the second length sum value is 1/2, and the length of the template picture is adjusted to be 1/2. And adjusting the width of the template picture based on the ratio of the first width sum value to the second width sum value, wherein the ratio of the first width sum value to the second width sum value is 1/2, and the width of the template picture is adjusted to be 1/2.
The process of matching the detection text box in the candidate text picture and the adjustment standard text box in the adjustment template picture is the same as the matching process, and the embodiment of the application is not repeated herein. Specifically, since the adjusted template picture and the candidate text picture have the same size, the sliding matching processing can be performed on the detection text box in the candidate text picture and the adjustment standard text box in the adjustment template picture, at this time, only the intersection comparison between one detection text box and each standard text box in the template picture needs to be calculated, the corresponding sliding position of the maximum intersection comparison is taken as the final relative position, and the detection text boxes in the candidate text picture and the standard text boxes in the template picture are matched one by one based on the final relative position, so that the matching is facilitated, the matching speed is increased, and the extraction and identification speed of the text key information is increased.
Alternatively, there may be instances where an empty standard text box may exist in the template picture. Specifically, whether an empty standard text box exists in the template picture is detected, and the empty standard text box is not matched with any detection text box in the candidate text picture; if the template picture has an empty standard text box, intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box; and labeling the missed detection area to obtain a detection text box corresponding to the empty standard text box.
Wherein, the empty standard text is that the standard text box in the template picture can not be matched with the detection text box, that is, the template picture has the standard text box, but the candidate text picture has missing detection text boxes due to the missing detection, for example, there are 14 standard text boxes in the template picture, and there are only 13 detection text boxes in the candidate text picture, that is, the condition of missing detection is generated for one detection text box, then after matching, there is an empty standard text box in the template picture, at this time, the missing detection area is intercepted in the candidate text picture based on the position of the empty standard text box, for example, the name standard text box in the identity card template picture is an empty standard text box, then the picture of the name area is intercepted in the identity card candidate text picture based on the position of the name standard text box in the identity card template picture, and then, the intercepted picture is marked again to obtain a corresponding name detection frame.
The matched template picture is further checked, and the missed detection area is re-marked and then detected again to obtain the detection text box, so that the problem that the target type information is lost due to the missed detection of the detection text box can be avoided, and the accuracy of detection and extraction of the target type information is improved.
Referring to fig. 14, a flowchart of a method for extracting target type information provided by an embodiment of the present application is shown, where the method for extracting target type information may be applied in a server. As shown in fig. 14, the method of extracting target type information may include the steps of:
step 1401, a standard text box for indicating an area where a text containing information of a target type is located is obtained from the template picture.
Step 1402, determining a detection text box corresponding to the standard text box, and extracting characters from the determined detection text box.
And 1403, identifying the extracted characters to obtain the information of the target type.
Specifically, when identifying the target type information, a text recognition model based on deep learning may be used for identification, and first, text information in the detection text boxes on all candidate pictures obtained in step 201 is used as sample text information, and "text information" is established for all sample text information: ID "mapping table, for example, the text information may be name, gender, native, etc., and the ID may be number 1, 2, 3, etc., so that the mapping table established is" name: 1, sex: 2, native method: 3". And then, the mapping table is used as an input line for training a text recognition model based on deep learning, and a text message and an ID corresponding to the text message are output after each training.
After the text recognition model based on deep learning is trained, extracting characters from a detection text box corresponding to a standard text box of a region where a text containing information of the target type is located, wherein the detection text box is obtained from a template picture, and taking the characters in the detection text box as input of the text recognition model to finally obtain the information of the target type.
Referring to fig. 15, a block diagram of a text recognition apparatus 1500 provided in an embodiment of the present application is shown, where the text recognition apparatus 1500 may be configured in a server. As shown in fig. 15, the text recognition apparatus 1500 may include: an annotation module 1501, an acquisition module 1502, a matching module 1503, and an extraction module 1504.
The labeling module 1501 is configured to label a target type text picture to be detected, so as to obtain a candidate text picture, where the candidate text picture includes a plurality of detection text boxes, and each detection text box includes a plurality of characters in the text picture to be detected, where a distance between the characters is smaller than a preset distance threshold.
The obtaining module 1502 is configured to obtain a template picture corresponding to a target type, where the template picture includes a plurality of standard text boxes, and different standard text boxes are used to indicate areas where texts containing different types of information are located in a sample text picture of the target type.
The matching module 1503 is configured to perform matching processing on the detected text box in the candidate text picture and the standard text box in the template picture.
The extracting module 1504 is configured to extract information of a target type from the candidate text picture according to a result of the matching process.
In one embodiment, the apparatus further comprises a first adjustment processing unit and a first matching unit.
The first adjustment processing unit is used for adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, and the adjusted template picture comprises a plurality of adjustment standard text boxes.
The first matching unit is used for matching the detection text box in the candidate text picture with the adjustment standard text box in the adjustment template picture.
In one embodiment, the apparatus further comprises a first calculation unit and a second adjustment processing unit.
The first calculating unit is used for calculating the sum of the lengths of all the detection text boxes in the candidate text picture to obtain a first length sum value; calculating the sum of the widths of all the detection text boxes in the candidate text picture to obtain a first width sum value; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value; and calculating the width sum of the standard text boxes in the template picture to obtain a second width sum value.
The second adjustment processing unit is configured to perform adjustment processing on the length of the template picture based on the first length sum value and the ratio of the second length sum value, and perform adjustment processing on the width of the template picture based on the first width sum value and the ratio of the second width sum value.
In one embodiment, the apparatus further comprises a sliding unit, a second calculating unit and a third matching unit.
The sliding unit is used for overlapping the template text picture on the candidate picture for sliding.
The second calculating unit is used for calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture after each sliding; and taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio relative to the template picture as a final relative position.
And the third matching unit is used for matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
In one embodiment, the apparatus further comprises a third calculation unit and a fourth calculation unit.
The third calculating unit is used for calculating the intersection ratio of the detection text box and each standard text box in the template picture for each detection text box in the candidate text picture, and taking the calculated maximum intersection ratio as the candidate intersection ratio corresponding to the detection text box.
The fourth calculating unit is used for calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In one embodiment, the apparatus further comprises a fifth calculation unit.
The fifth calculating unit is configured to calculate, for each detected text box in the candidate text picture located at the final relative position, an intersection ratio between the detected text box and each standard text box in the template picture located at the final relative position, and use a standard text box corresponding to the calculated maximum intersection ratio as a standard text box matched with the detected text box.
In one embodiment, the apparatus further includes a second obtaining unit, a establishing unit, and a fourth matching unit.
The second obtaining unit is configured to determine an extreme point of each detected text box in the candidate text picture based on the position of each detected text box in the candidate text picture, establish a first coordinate system in the candidate text picture based on the extreme point of each detected text box in the candidate text picture, and obtain coordinates of each detected text box in the first coordinate system.
The establishing unit is used for determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring the coordinates of each standard text box in the second coordinate system.
The fourth matching unit is used for matching the detection text box in the candidate text picture with the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
In one embodiment, the apparatus further comprises a sixth calculation unit.
The sixth calculating unit is configured to determine, for each detected text box in the candidate text picture, a projection position of the detected text box in the second coordinate system based on the coordinates of the detected text box in the first coordinate system, calculate an intersection ratio between the detected text box and each standard text box when the detected text box is located at the projection position, and use a standard text box corresponding to the calculated maximum intersection ratio as a standard text box matched with the detected text box.
In one embodiment, the device further comprises a detection unit, a truncation unit and a first labeling unit.
The detection unit is used for detecting whether an empty standard text box exists in the template picture, and the empty standard text box is not matched with any detection text box in the candidate text picture.
The intercepting unit is used for intercepting the missed detection area in the candidate text picture based on the position of the empty standard text box if the empty standard text box exists in the template picture.
The first labeling unit is used for labeling the missed detection area and obtaining a detection text box corresponding to the empty standard text box through labeling processing.
In one embodiment, the device further comprises a third obtaining unit, a second labeling unit, a seventh calculating unit and a generating unit.
The third acquiring unit is used for acquiring a plurality of sample text pictures of the target type.
The second labeling unit is configured to perform labeling processing on each sample text picture to obtain a text box set corresponding to each sample text picture, where the text box set includes a plurality of candidate text boxes, and each candidate text box includes a plurality of characters, in the corresponding sample text picture, whose distances between the characters are smaller than a preset distance threshold.
The seventh calculating unit is configured to, for each information type contained in the sample text picture, obtain a candidate text box corresponding to the information type from each text box set, and calculate a size average value and a position average value of the obtained candidate text boxes.
The generating unit is used for generating a standard text box corresponding to each information type based on the size average value and the position average value of the candidate text box corresponding to each information type; and obtaining a template picture based on the generated standard text boxes.
In one embodiment, the apparatus further comprises a fourth acquiring unit, a determining unit and an identifying unit.
The fourth obtaining unit is used for obtaining a standard text box which is used for indicating an area where a text containing the information of the target type is located from the template picture.
The determining unit is used for determining the detection text box corresponding to the standard text box and extracting characters from the determined detection text box.
The recognition unit is used for recognizing the extracted characters to obtain the information of the target type.
The text recognition device provided by the embodiment of the application can realize the method embodiment, the realization principle and the technical effect are similar, and the details are not repeated herein.
For the specific definition of the text recognition device, reference may be made to the above definition of the text recognition method, which is not described herein again. The various modules in the request text recognition described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute the operations of the modules.
In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the following steps when executing the computer program:
labeling the target type text picture to be detected to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the text picture to be detected; acquiring a template picture corresponding to a target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information in a sample text picture of the target type are located; matching the detection text box in the candidate text picture with the standard text box in the template picture; and extracting the information of the target type from the candidate text picture according to the matching processing result.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: matching the detection text box in the candidate text picture with the standard text box in the template picture, wherein the matching comprises the following steps: adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, wherein the adjusted template picture comprises a plurality of adjusting standard text boxes; and matching the detection text box in the candidate text picture with the adjustment standard text box in the adjustment template picture.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: calculating the sum of the lengths of all the detection text boxes in the candidate text picture to obtain a first length sum value; calculating the sum of the widths of all the detection text boxes in the candidate text picture to obtain a first width sum value; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value; calculating the sum of the widths of the standard text boxes in the template picture to obtain a second width sum value; and adjusting the length of the template picture based on the ratio of the first length sum value and the second length sum value, and adjusting the width of the template picture based on the ratio of the first width sum value and the second width sum value.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: overlaying the template text picture on the candidate picture for sliding; after each sliding, calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture; taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio as a final relative position; and matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: for each detection text box in the candidate text picture, calculating the intersection ratio of the detection text box and each standard text box in the template picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box; and calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: and for each detection text box in the candidate text picture at the final relative position, calculating the intersection ratio of the detection text box and each standard text box in the template picture at the final relative position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: determining extreme points of the detection text boxes in the candidate text pictures based on the positions of the detection text boxes in the candidate text pictures, establishing a first coordinate system in the candidate text pictures based on the extreme points of the detection text boxes in the candidate text pictures, and acquiring the coordinates of each detection text box in the first coordinate system; determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring coordinates of each standard text box in the second coordinate system; and matching the detection text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: and for each detection text box in the candidate text picture, determining the projection position of the detection text box in the second coordinate system based on the coordinates of the detection text box in the first coordinate system, calculating the intersection ratio of the detection text box and each standard text box under the condition that the detection text box is positioned at the projection position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: detecting whether an empty standard text box exists in the template picture, wherein the empty standard text box is not matched with any detection text box in the candidate text picture; if the template picture has an empty standard text box, intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box; and labeling the missed detection area to obtain a detection text box corresponding to the empty standard text box.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: obtaining a plurality of sample text pictures of a target type; labeling each sample text picture to obtain a text box set corresponding to each sample text picture, wherein the text box set comprises a plurality of candidate text boxes, and each candidate text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the corresponding sample text picture; for each information type contained in the sample text picture, acquiring a candidate text box corresponding to the information type from each text box set, and calculating a size mean value and a position mean value of the acquired candidate text boxes; generating a standard text box corresponding to each information type based on the size average value and the position average value of the candidate text box corresponding to each information type; and obtaining a template picture based on the generated standard text boxes.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: acquiring a standard text box used for indicating an area where a text containing information of a target type is located from a template picture; determining a detection text box corresponding to the standard text box, and extracting characters from the determined detection text box; and identifying the extracted characters to obtain the information of the target type.
The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.
In an embodiment of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of:
labeling the target type text picture to be detected to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the text picture to be detected; acquiring a template picture corresponding to a target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information in a sample text picture of the target type are located; matching the detection text box in the candidate text picture with the standard text box in the template picture; and extracting the information of the target type from the candidate text picture according to the matching processing result.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: matching the detection text box in the candidate text picture with the standard text box in the template picture, wherein the matching comprises the following steps: adjusting the size of the template picture based on the size of the candidate text picture to obtain an adjusted template picture, wherein the adjusted template picture comprises a plurality of adjusting standard text boxes; and matching the detection text box in the candidate text picture with the adjustment standard text box in the adjustment template picture.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: calculating the sum of the lengths of all the detection text boxes in the candidate text picture to obtain a first length sum value; calculating the sum of the widths of all the detection text boxes in the candidate text picture to obtain a first width sum value; calculating the sum of the lengths of the standard text boxes in the template picture to obtain a second length sum value; calculating the sum of the widths of the standard text boxes in the template picture to obtain a second width sum value; and adjusting the length of the template picture based on the ratio of the first length sum value and the second length sum value, and adjusting the width of the template picture based on the ratio of the first width sum value and the second width sum value.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: overlaying the template text picture on the candidate picture for sliding; after each sliding, calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture; taking the sliding position of the candidate text picture corresponding to the calculated maximum intersection ratio and the calculated maximum intersection ratio as a final relative position; and matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: for each detection text box in the candidate text picture, calculating the intersection ratio of the detection text box and each standard text box in the template picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box; and calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and for each detection text box in the candidate text picture at the final relative position, calculating the intersection ratio of the detection text box and each standard text box in the template picture at the final relative position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: determining extreme points of the detection text boxes in the candidate text pictures based on the positions of the detection text boxes in the candidate text pictures, establishing a first coordinate system in the candidate text pictures based on the extreme points of the detection text boxes in the candidate text pictures, and acquiring the coordinates of each detection text box in the first coordinate system; determining extreme points of the standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring coordinates of each standard text box in the second coordinate system; and matching the detection text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and for each detection text box in the candidate text picture, determining the projection position of the detection text box in the second coordinate system based on the coordinates of the detection text box in the first coordinate system, calculating the intersection ratio of the detection text box and each standard text box under the condition that the detection text box is positioned at the projection position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: detecting whether an empty standard text box exists in the template picture, wherein the empty standard text box is not matched with any detection text box in the candidate text picture; if the template picture has an empty standard text box, intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box; and labeling the missed detection area to obtain a detection text box corresponding to the empty standard text box.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: obtaining a plurality of sample text pictures of a target type; labeling each sample text picture to obtain a text box set corresponding to each sample text picture, wherein the text box set comprises a plurality of candidate text boxes, and each candidate text box comprises a plurality of characters, the distance between the characters is smaller than a preset distance threshold value, in the corresponding sample text picture; for each information type contained in the sample text picture, acquiring a candidate text box corresponding to the information type from each text box set, and calculating a size mean value and a position mean value of the acquired candidate text boxes; generating a standard text box corresponding to each information type based on the size average value and the position average value of the candidate text box corresponding to each information type; and obtaining a template picture based on the generated standard text boxes.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: acquiring a standard text box used for indicating an area where a text containing information of a target type is located from a template picture; determining a detection text box corresponding to the standard text box, and extracting characters from the determined detection text box; and identifying the extracted characters to obtain the information of the target type.
The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of text recognition, the method comprising:
labeling a target type text picture to be detected to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters of which the distance between the characters is smaller than a preset distance threshold value;
acquiring a template picture corresponding to the target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information in the sample text picture of the target type are located;
matching the detection text box in the candidate text picture with the standard text box in the template picture;
and extracting information of the target type from the candidate text picture according to the matching processing result.
2. The method according to claim 1, wherein the matching the detected text box in the candidate text picture and the standard text box in the template picture comprises:
overlaying the template text picture on the candidate picture for sliding;
after each sliding, calculating the sum of the intersection ratio of each detection text box in the candidate text picture and each standard text box in the template picture;
taking the sliding position of the candidate text picture corresponding to the maximum intersection ratio obtained by calculation relative to the template picture as a final relative position;
and matching the detection text box in the candidate text picture with the standard text box in the template picture based on the final relative position.
3. The method according to claim 2, wherein the calculating a sum of intersection ratios of each detected text box in the candidate text picture and each standard text box in the template picture comprises:
for each detection text box in the candidate text picture, calculating the intersection ratio of the detection text box and each standard text box in the template picture, and taking the maximum intersection ratio obtained by calculation as the candidate intersection ratio corresponding to the detection text box;
and calculating the sum of candidate intersection ratios corresponding to each detection text box in the candidate text pictures.
4. The method according to claim 2, wherein the matching the detected text box in the candidate text picture and the standard text box in the template picture based on the final relative position comprises:
and for each detection text box in the candidate text picture at the final relative position, calculating the intersection ratio of the detection text box and each standard text box in the template picture at the final relative position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
5. The method according to claim 1, wherein the matching the detected text box in the candidate text picture and the standard text box in the template picture comprises:
determining extreme points of the detected text boxes in the candidate text picture based on the positions of the detected text boxes in the candidate text picture, establishing a first coordinate system in the candidate text picture based on the extreme points of the detected text boxes in the candidate text picture, and acquiring coordinates of each detected text box in the first coordinate system;
determining extreme points of standard text boxes in the template picture based on the positions of the standard text boxes in the template picture, establishing a second coordinate system in the template picture based on the extreme points of the standard text boxes in the template picture, and acquiring coordinates of each standard text box in the second coordinate system;
and matching the detection text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detection text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system.
6. The method according to claim 5, wherein the matching the detected text box in the candidate text picture and the standard text box in the template picture based on the coordinates of each detected text box in the first coordinate system and the coordinates of each standard text box in the second coordinate system comprises:
and for each detection text box in the candidate text picture, determining the projection position of the detection text box in the second coordinate system based on the coordinates of the detection text box in the first coordinate system, calculating the intersection ratio of the detection text box and each standard text box under the condition that the detection text box is located at the projection position, and taking the standard text box corresponding to the maximum intersection ratio obtained by calculation as the standard text box matched with the detection text box.
7. The method according to claim 1, wherein after the matching process is performed on the detected text box in the candidate text picture and the standard text box in the template picture, the method further comprises:
detecting whether an empty standard text box exists in the template picture, wherein the empty standard text box is not matched with any detection text box in the candidate text picture;
if the empty standard text box exists in the template picture, intercepting a missed detection area in the candidate text picture based on the position of the empty standard text box;
and labeling the missed detection area, and obtaining a detection text box corresponding to the empty standard text box through labeling.
8. A text recognition apparatus, characterized in that the apparatus comprises:
the marking module is used for marking the text picture to be detected of the target type to obtain a candidate text picture, wherein the candidate text picture comprises a plurality of detection text boxes, and each detection text box comprises a plurality of characters of which the distance between the characters is smaller than a preset distance threshold value;
the acquisition module is used for acquiring a template picture corresponding to the target type, wherein the template picture comprises a plurality of standard text boxes, and different standard text boxes are used for indicating areas where texts containing different types of information are located in the sample text picture of the target type;
the matching module is used for matching the detection text box in the candidate text picture with the standard text box in the template picture;
and the extraction module is used for extracting the information of the target type from the candidate text picture according to the matching processing result.
9. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements a text recognition method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a text recognition method according to any one of claims 1 to 7.
CN202011495134.7A 2020-12-17 2020-12-17 Text recognition method, device, equipment and storage medium Pending CN112580499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011495134.7A CN112580499A (en) 2020-12-17 2020-12-17 Text recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011495134.7A CN112580499A (en) 2020-12-17 2020-12-17 Text recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112580499A true CN112580499A (en) 2021-03-30

Family

ID=75135794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011495134.7A Pending CN112580499A (en) 2020-12-17 2020-12-17 Text recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112580499A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723069A (en) * 2021-09-03 2021-11-30 北京房江湖科技有限公司 File detection method and system, machine-readable storage medium and electronic device
CN115063814A (en) * 2022-08-22 2022-09-16 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723069A (en) * 2021-09-03 2021-11-30 北京房江湖科技有限公司 File detection method and system, machine-readable storage medium and electronic device
CN115063814A (en) * 2022-08-22 2022-09-16 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment
CN115063814B (en) * 2022-08-22 2022-12-23 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment

Similar Documents

Publication Publication Date Title
CN110390269B (en) PDF document table extraction method, device, equipment and computer readable storage medium
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
CN110619330A (en) Recognition model training method and device, computer equipment and recognition method
WO2021012382A1 (en) Method and apparatus for configuring chat robot, computer device and storage medium
CN110796082B (en) Nameplate text detection method and device, computer equipment and storage medium
CN108961315B (en) Target tracking method and device, computer equipment and storage medium
CN111242126A (en) Irregular text correction method and device, computer equipment and storage medium
CN112613506A (en) Method and device for recognizing text in image, computer equipment and storage medium
CN112418278A (en) Multi-class object detection method, terminal device and storage medium
CN110728687B (en) File image segmentation method and device, computer equipment and storage medium
CN111368638A (en) Spreadsheet creation method and device, computer equipment and storage medium
CN112580499A (en) Text recognition method, device, equipment and storage medium
CN110807454A (en) Character positioning method, device and equipment based on image segmentation and storage medium
CN111291741B (en) Receipt identification method and device, computer equipment and storage medium
CN112926421A (en) Image processing method and apparatus, electronic device, and storage medium
CN112766275B (en) Seal character recognition method and device, computer equipment and storage medium
CN112906532B (en) Image processing method and device, electronic equipment and storage medium
CN114511865A (en) Method and device for generating structured information and computer readable storage medium
CN111325106B (en) Method and device for generating training data
CN112926564A (en) Picture analysis method, system, computer device and computer-readable storage medium
CN117115823A (en) Tamper identification method and device, computer equipment and storage medium
CN113221897A (en) Image correction method, image text recognition method, identity verification method and device
CN110895849A (en) Method and device for cutting and positioning crown word number, computer equipment and storage medium
CN115984185A (en) Paper towel package defect detection method, device and system and storage medium
CN112836682A (en) Method and device for identifying object in video, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination