CN112001331B

CN112001331B - Image recognition method, device, equipment and storage medium

Info

Publication number: CN112001331B
Application number: CN202010873733.1A
Authority: CN
Inventors: 高万顺
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-06-18
Anticipated expiration: 2040-08-26
Also published as: CN112001331A

Abstract

The embodiment of the application discloses an image recognition method, an image recognition device, image recognition equipment and a storage medium, and belongs to the field of image processing. The method comprises the following steps: acquiring a template image corresponding to the first image, wherein the template image comprises an identification frame; character recognition is carried out on the first image so as to determine a first text box in the first image; performing point set registration on a first text box in a first image and an identification box in a template image to obtain an identification box pairing relation between the first image and the template image; and determining a structural identification result of the first image based on the identification frame correspondence. The image recognition mode does not need to set an anchor point frame in the template image, can simplify the setting operation of the template image and reduces the setting workload of the template. In addition, when a plurality of identification frames with the same text content exist or no anchor point frame exists, the matching relation of the identification frames can be accurately determined, a structural identification result is obtained, and the image identification scene is expanded.

Description

Image recognition method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to an image recognition method, apparatus, device, and storage medium.

Background

In many scenarios, it is necessary to identify ticket-like images, card-like images, custom template images, or the like, to identify text in relevant locations in these images. For example, an identity card image needs to be automatically identified to obtain identity information.

At present, a user is generally required to set a corresponding template image for an image to be identified in advance, set an anchor block and an identification frame in the template image, set corresponding text attributes for the anchor block and the identification frame, and then identify the image to be identified according to the template image. The anchor block refers to a text box where the fixed text content is located, such as a title text box of name, gender and the like. The recognition box refers to a text box of text content to be extracted, such as a text box where a specific name or gender is located. The text attribute is used to indicate the type of text content in the text box, such as name or gender. When the first image to be identified is identified according to the template image, character identification can be performed on the first image to obtain a text box in the first image. And then, based on the text content in the text box, matching the text box in the first image with the anchor point box in the template image, and taking the text box in the first image, which is the same as the text content of the anchor point box of the template image, as the anchor point box to obtain the anchor point box matching relation between the first image and the template image. And matching the text box in the first image with the recognition box in the template image based on the text content in the text box, and taking the text box which is the same as the text content of the recognition box of the template image in the first image as the recognition box to obtain the recognition box matching relation between the first image and the template image. And performing perspective transformation on the first image according to the anchor point frame matching relation to obtain a second image matched with the template image. And finally, determining a structural recognition result of the second image based on the anchor point frame matching relationship and the recognition frame matching relationship, wherein the structural recognition result comprises text content and text attributes of the recognition frames in the second image.

Since the anchor point frame and the identification frame are required to be set for the template image in advance by a user, a certain operation difficulty and template setting workload are provided. Moreover, the image recognition method has a certain limitation. For example, in the case where there is no anchor frame in the image to be identified or the template image, identification cannot be performed. Or if a plurality of recognition frames with the same text content appear in the image to be recognized or the template image, the recognition frames can not be accurately matched only according to the text content, namely the matching relationship of the recognition frames between the image to be recognized and the template image can not be accurately determined, so that the image recognition result is affected, and the image recognition accuracy is lower.

Disclosure of Invention

The embodiment of the application provides an image recognition method, an image recognition device, image recognition equipment and a storage medium, which can be used for solving the problems that the image recognition method in the related technology has certain limitation and has low recognition accuracy. The technical scheme is as follows:

In one aspect, there is provided an image recognition method, the method comprising:

Acquiring a template image corresponding to a first image, wherein the template image comprises an identification frame, and the identification frame has corresponding text attributes;

character recognition is carried out on the first image so as to determine a first text box in the first image;

Performing point set registration on a first text box in the first image and an identification frame in the template image, and taking the first text box in the first image and the identification frame in the template image which are registered with each other as the identification frame to obtain an identification frame pairing relation, wherein the identification frame pairing relation comprises a one-to-one pairing relation between the identification frame in the first image and the identification frame in the template image;

And determining a structural recognition result of the first image based on the corresponding relation of the recognition frames, wherein the structural recognition result comprises text content and text attributes of the recognition frames in the first image.

Optionally, the determining the structured recognition result of the first image based on the recognition frame correspondence includes:

Performing perspective transformation on the first image based on the identification frame pairing relation to obtain a second image matched with the template image;

And determining a structural recognition result of the second image based on the corresponding relation of the recognition frames.

Optionally, the determining the structured recognition result of the second image based on the recognition frame correspondence includes:

Performing character recognition on the second image to obtain a character recognition result;

and carrying out character matching on the character recognition result and the recognition frames in the second image based on the recognition frame pairing relation to obtain a structural recognition result of the second image.

Optionally, before the point set registration is performed on the first text box in the first image and the identification box in the template image, the method further includes:

performing character recognition on the template image to determine a second text box in the template image;

Based on text content and position of text boxes, matching a first text box in the first image with a second text box in the template image, and taking the text boxes matched with each other as anchor boxes to obtain anchor box pairing relations, wherein the anchor box pairing relations comprise one-to-one pairing relations between the anchor boxes in the first image and the anchor boxes in the template image;

determining the number of anchor blocks matched one by one in the anchor block matching relationship;

the performing point set registration on the first text box in the first image and the recognition box in the template image includes:

and carrying out point set registration on the first text box in the first image and the identification box in the template image based on the anchor point box number.

Optionally, based on the text content and the position of the text box, matching the first text box in the first image with the second text box in the template image, and using the text box matched with the first text box as an anchor point box to obtain an anchor point box pairing relationship, which includes:

Based on text content of the text boxes, matching a first text box in the first image with a second text box in the template image, and taking the text boxes matched with each other as anchor boxes to obtain an initial anchor box matching relationship between the anchor boxes in the first image and the anchor boxes in the template image, wherein the text boxes matched with each other are text boxes with the same text content;

and if the anchor point frames in the first image and the anchor point frames in the template image are determined not to be matched one by one based on the initial anchor point frame matching relationship, carrying out point set registration on the anchor point frames in the first image and the anchor point frames in the template image to obtain the identification frame matching relationship.

Optionally, the method further comprises:

And if the anchor block in the first image is determined to be matched with the anchor block in the template image one by one based on the initial anchor block matching relationship, determining the initial anchor block matching relationship as the identification block matching relationship.

Optionally, the performing point set registration on the first text box in the first image and the identification box in the template image based on the anchor box number includes:

If the number of anchor blocks does not meet the number requirement, performing point set registration on all the first text blocks in the first image and the identification blocks in the template image;

and if the number of the anchor blocks meets the number requirement, carrying out point set registration on the rest text blocks in the first image and the recognition frames in the template image, wherein the rest text blocks refer to the first text blocks except the anchor blocks in the first image.

Optionally, the determining, based on the correspondence of the recognition frames, a structured recognition result of the first image, where the structured recognition result includes text content and text attributes of the recognition frames in the first image includes:

If the number of the anchor blocks does not meet the number requirement, performing perspective transformation on the first image based on the identification frame pairing relation to obtain a second image matched with the template image, and determining a structural identification result of the second image based on the identification frame correspondence relation;

if the number of the anchor blocks meets the number requirement, performing perspective transformation on the first image based on the anchor block matching relation and the identification block matching relation to obtain a second image matched with the template image, and determining a structural identification result of the second image based on the anchor block matching relation and the identification block matching relation.

Optionally, before determining the structural recognition result of the second image based on the anchor frame pairing relationship and the recognition frame pairing relationship, the method further includes:

Determining staggered information between an anchor block and an identification block in the first image based on the anchor block pairing relationship and the identification block pairing relationship;

based on the staggered information, carrying out position correction on the identification frame in the first image to obtain an identification frame correction result;

The determining the structural recognition result of the second image based on the anchor point frame pairing relationship and the recognition frame pairing relationship comprises the following steps:

And determining a structural identification result of the second image based on the anchor block matching relationship, the identification frame matching relationship and the identification frame correction result.

Optionally, the determining the structural recognition result of the second image based on the anchor frame pairing relationship, the recognition frame pairing relationship and the recognition frame correction result includes:

And carrying out character matching on the character recognition result and the recognition frame in the second image based on the anchor point frame pairing relation, the recognition frame pairing relation and the recognition frame correction result to obtain a structured recognition result of the second image.

In another aspect, there is provided an image recognition apparatus, the apparatus including:

The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a template image corresponding to a first image, the template image comprises an identification frame, and the identification frame has a corresponding text attribute;

the first recognition module is used for carrying out character recognition on the first image so as to determine a first text box in the first image;

The registration module is used for carrying out point set registration on a first text box in the first image and the recognition frame in the template image, and taking the first text box in the first image and the recognition frame in the template image which are registered with each other as the recognition frame to obtain a recognition frame pairing relation, wherein the recognition frame pairing relation comprises a one-to-one pairing relation between the recognition frame in the first image and the recognition frame in the template image;

And the first determining module is used for determining a structural recognition result of the first image based on the corresponding relation of the recognition frames, wherein the structural recognition result comprises text content and text attributes of the recognition frames in the first image.

Optionally, the first determining module is configured to:

Optionally, the apparatus further comprises:

The second recognition module is also used for carrying out character recognition on the template image so as to determine a second text box in the template image;

The matching module is used for matching the first text box in the first image with the second text box in the template image based on the text content and the position of the text box, and taking the text boxes which are matched with each other as anchor boxes to obtain anchor box pairing relations, wherein the anchor box pairing relations comprise one-to-one pairing relations between the anchor boxes in the first image and the anchor boxes in the template image;

The second determining module is used for determining the number of anchor blocks matched one by one in the anchor block matching relationship;

and the registration module is used for carrying out point set registration on the first text box in the first image and the identification box in the template image based on the anchor point box quantity.

Optionally, the matching module includes:

The matching unit is used for matching a first text box in the first image with a second text box in the template image based on the text content of the text box, and using the text boxes matched with each other as anchor boxes to obtain an initial anchor box matching relationship between the anchor boxes in the first image and the anchor boxes in the template image, wherein the text boxes matched with each other are text boxes with the same text content;

And the registration unit is used for carrying out point set registration on the anchor point frames in the first image and the anchor point frames in the template image to obtain the identification frame pairing relation if the anchor point frames in the first image and the anchor point frames in the template image are determined not to be in one-to-one matching based on the initial anchor point frame matching relation.

Optionally, the matching module further includes:

And the first determining unit is used for determining the initial anchor frame matching relationship as the identification frame matching relationship if the anchor frames in the first image and the anchor frames in the template image are determined to be matched one by one based on the initial anchor frame matching relationship.

Optionally, the registration unit is configured to:

Optionally, the first determining module includes:

The second determining unit is used for performing perspective transformation on the first image based on the identification frame pairing relation to obtain a second image matched with the template image if the number of the anchor frames does not meet the number requirement, and determining a structural identification result of the second image based on the identification frame correspondence relation;

And the third determining unit is used for performing perspective transformation on the first image based on the anchor frame pairing relation and the identification frame pairing relation if the number of the anchor frames meets the number requirement to obtain a second image matched with the template image, and determining a structural identification result of the second image based on the anchor frame pairing relation and the identification frame pairing relation.

Optionally, the third determining unit is further configured to:

In another aspect, there is also provided a computer device, the device comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to perform the steps of any of the image recognition methods described above.

In another aspect, there is also provided a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of any of the image recognition methods described above.

In another aspect, there is also provided a computer program product for implementing the steps of any of the image recognition methods described above when the computer program product is executed.

The technical scheme provided by the embodiment of the application has the beneficial effects that:

In the embodiment of the application, for the template image with no anchor point frame and only identification frames, the text frames identified in the first image can be used as one point set, the identification frames in the template image are used as another point set, the identification frames are identified from the text frames in the first image by carrying out point set registration on the text frames in the first image and the identification frames in the template image, the one-to-one pairing relation between the identification frames in the first image and the identification frames in the template image is determined, and then the structural identification result of the first image is determined based on the one-to-one pairing relation between the identification frames. By the image recognition mode, an anchor point frame is not required to be set in the template image in advance, so that the setting operation of the template image can be simplified, and the template setting workload is reduced. Moreover, the matching relation of the recognition frames between the first image and the template image is determined through point set registration, and the matching relation of the recognition frames can be accurately determined under the condition that a plurality of recognition frames with the same text content appear in the first image or the template image or under the condition that anchor frames do not exist in the first image or the template image, so that a final structured recognition result is obtained, the image recognition scene is expanded, and the accuracy of image recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an image recognition method according to an embodiment of the present application;

FIG. 2 is a flowchart of another image recognition method according to an embodiment of the present application;

fig. 3 is a block diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Before explaining the embodiment of the present application in detail, an application scenario of the embodiment of the present application is described.

The image recognition method provided by the embodiment of the application is mainly applied to recognition of bill images, card images or custom template images and the like so as to recognize texts at relevant positions in the images. The image recognition method can be applied to computer equipment, and the computer equipment can be a mobile phone, a tablet computer, a computer or a server and the like.

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application, where the method is used in a computer device. Referring to fig. 1, the method includes:

Step 101: and acquiring a template image corresponding to the first image, wherein the template image comprises an identification frame, and the identification frame has corresponding text attributes.

The first image is an image to be identified, and the template image corresponding to the first image is a typical image which is similar to the image type of the first image and is set for identifying the first image. The first image and the template image can be bill images, card images, custom template images, and the like. For example, if the first image is a ticket image, the template image corresponding to the first image is a typical ticket image.

It should be noted that, in the embodiment of the present application, the template image may only include an identification frame, and does not include an anchor frame. That is, when the user sets the template image, the user can set the identification frame only in the template image without setting the anchor point frame, so that the setting operation of the user on the template image can be simplified, and the workload of setting the template is reduced.

In addition, at least one identification frame, i.e., 1 or more identification frames, may be included in the template image. Each recognition box has a corresponding text attribute, which refers to an attribute of the text in the recognition box that can be used to indicate the type of text in the recognition box. For example, if the text in the recognition box is the name "Zhang Sanj", the text attribute of the recognition box may be the name. If the text in the identification box is gender "female", the text attribute of the identification box may be gender.

Step 102: character recognition is performed on the first image to determine a first text box in the first image.

Wherein at least one first text box, i.e. 1 or more first text boxes, may be included in the first image. Each first text box comprises recognized text, i.e. the text content in each first text box may consist of 1 or more characters recognized from the first image.

By carrying out character recognition on the first image, characters in the first image can be recognized, 1 or more characters with similar positions can form a text, and a text box where the text is located is the first text box.

Note that, the character recognition algorithm used for performing character recognition may be an OCR (Optical Character Recognition ) algorithm, or the like, which is not limited in the embodiment of the present application.

Step 103: and carrying out point set registration on the first text box in the first image and the recognition frame in the template image, and taking the first text box in the first image and the recognition frame in the template image which are registered with each other as the recognition frame to obtain a recognition frame pairing relation, wherein the recognition frame pairing relation comprises a one-to-one pairing relation between the recognition frames in the first image and the recognition frames in the template image.

It should be noted that point set registration refers to that given two point sets, the two point sets are registered, and a one-to-one pairing relationship of the two point sets is obtained. Registering, i.e. aligning, the two sets of points, finds points that correspond to each other.

In the embodiment of the application, in order to find the one-to-one pairing relation between the first text box in the first image and the identification box in the template image, the first text box in the first image can be used as one point set, the identification box in the template image is used as the other point set, the two point sets are registered, the identification boxes corresponding to each other in the first image and the template image are found, and the one-to-one pairing relation between the identification box in the first image and the identification box in the template image is obtained.

The point set registration is performed on the first text box in the first image and the recognition box in the template image, that is, the point set registration is performed based on the position of the first text box in the first image and the position of the recognition box in the template image. Because the registration is performed based on the positions of the text boxes and the recognition boxes, the matching relationship of the recognition boxes can be accurately registered under the condition that a plurality of recognition boxes with the same text content appear in an image to be recognized or a template image, and the problem that the matching relationship of the recognition boxes cannot be accurately determined if a plurality of recognition boxes with the same text content appear in the image to be recognized or the template image when the recognition boxes are matched based on the text content in the related technology is avoided.

The point set registration method adopted for point set registration may be CPD (Coherent Point Drift ) algorithm, ICP (ITERATIVE CLOSEST POINT, iterative closest point) algorithm, RPM (Robust Point Matching ) or KC (Kernel Correlation, kernel correlation) algorithm, etc.

If the correspondence of the point set point is regarded as a probability, the true correspondence point probability is desirably 1, and the false correspondence point probability value is 0. Based on this, the correspondence of the point set can be described using one probability value, and the larger the probability value is, the greater the certainty of such correspondence is. Since probabilities are involved in the correspondence of point sets, a mathematical model may be chosen to describe the correspondence of such point sets. In CPD algorithms, GMM (Gaussian Mixture Model ) is typically used to describe the correspondence of point sets, so that the problem of registration of two point sets is converted into a problem of probability density estimation, i.e. solving the parameter problem of gaussian mixture model.

The process of performing point set registration on the first text box in the first image and the identification box in the template image by adopting the CPD algorithm may include: first, a first text box in a first image is used as an original point set to be registered, an identification box in a template image is used as a target point set, and the original point set is required to be converted to be registered with the target point set. Then, an assumption is made on the transformation relationship from the origin set to the target point set, and the currently supported transformation relationship is rigid transformation, elastic transformation, affine transformation, and the like, and affine transformation can be used in the embodiment of the application. The transformation relationship is an implicit parameter of the gaussian mixture model. After that, a transformation matrix corresponding to the transformation relationship is initialized, for example, an affine transformation is taken as an example, and the affine transformation matrix can be initialized. Then, according to the distance relation between the origin set and each point of the target point set after transformation, probability distribution and loss are calculated, and the transformation matrix is continuously optimized through an EM (Expectation Maximization Algorithm, expectation maximization) algorithm until the loss is small enough or the iteration number reaches the maximum value. After the optimization is completed, an optimized affine transformation matrix can be obtained, a Gaussian mixture model for describing the corresponding relation of the two point sets can be determined based on the optimized affine transformation matrix, and the real corresponding relation of the two point sets can be determined based on the Gaussian mixture model.

Step 104: based on the identification frame correspondence, a structured identification result of the first image is determined, the structured identification result including text content and text attributes of the identification frame in the first image.

As one example, the text attribute of each recognition frame in the first image may be determined based on the recognition frame correspondence, with the text attribute of the recognition frame in the template image being the text attribute of the recognition frame in the first image corresponding to the recognition frame in the template image. Then, the structured recognition result of the first image is determined in combination with the text content in the recognition frame of the first image recognized previously and the text attribute of the recognition frame in the first image determined currently. This way the recognition efficiency is high.

As another example, the first image may be subjected to perspective transformation based on the identification frame pairing relationship, so as to obtain a second image matched with the template image. And determining a structural recognition result of the second image based on the corresponding relation of the recognition frames. Wherein the structured recognition result of the second image comprises text content and text attributes of the recognition box in the second image.

Since the first image may have distortion, such as tilting or zooming, with respect to the template image, a recognition result obtained by performing character recognition on the first image may have a recognition error, and if the structured recognition result of the first image is determined directly based on the previous recognition result, there may be a certain influence on recognition accuracy. In this aspect, the first image is subjected to perspective transformation based on the recognition frame pairing relationship, so that the first image can be subjected to image correction, and a second image matching the template image can be obtained. And then, based on the corresponding relation of the identification frames, determining the structural identification result of the corrected second image, so that the accuracy of image identification can be improved.

As one example, performing perspective transformation on the first image based on the recognition frame pairing relationship to obtain a second image that matches the template image may include: based on the identification frame pairing relation, constructing a perspective transformation matrix of the first image, and performing perspective transformation on the first image based on the perspective transformation matrix to obtain a second image matched with the template image.

As one example, determining the structured recognition result of the second image based on the recognition frame correspondence may include: and performing character recognition on the second image to obtain a character recognition result, and performing character matching on the character recognition result and a recognition frame in the second image based on the recognition frame pairing relationship to obtain a structured recognition result of the second image.

After perspective transformation is carried out on the first image to obtain a second image, the identification frame in the first image is correspondingly transformed into the identification frame in the second image, and correspondingly, the identification frame pairing relation between the first image and the template image can be also transformed into the identification frame pairing relation between the second image and the template image. Because the second image is the corrected image, the character recognition is carried out on the second image again, a more accurate character recognition result can be obtained, and the accuracy of the image recognition can be further improved. Then, according to the matching relation of the recognition frames, different character recognition results can be matched into corresponding recognition frames in the second image to obtain text content of each recognition frame, and then the text attribute of each recognition frame is combined to obtain a final structured recognition result.

In the embodiment of the application, for the template image with no anchor point frame and only identification frames, the text frames identified in the first image can be used as one point set, the identification frames in the template image are used as another point set, the identification frames are identified from the text frames in the first image by carrying out point set registration on the text frames in the first image and the identification frames in the template image, the one-to-one pairing relation between the identification frames in the first image and the identification frames in the template image is determined, and then the structural identification result of the first image is determined based on the one-to-one pairing relation between the identification frames. By the image recognition mode, an anchor point frame is not required to be set in the template image in advance, so that the setting operation of the template image can be simplified, the template setting workload is reduced, the use threshold of a user is reduced, and the method has higher use value. Moreover, the matching relation of the recognition frames between the first image and the template image is determined through point set registration, and the matching relation of the recognition frames can be accurately determined under the condition that a plurality of recognition frames with the same text content appear in the first image or the template image or under the condition that anchor frames do not exist in the first image or the template image, so that a final structured recognition result is obtained, the image recognition scene is expanded, and the accuracy of image recognition is improved.

Fig. 2 is a flowchart of another image recognition method according to an embodiment of the present application, which is used in a computer device. Referring to fig. 2, the method includes:

Step 201: and acquiring a template image corresponding to the first image, wherein the template image comprises an identification frame, and the identification frame has corresponding text attributes.

It should be noted that, the specific implementation of step 201 may refer to the description of step 101, which is not limited by the embodiment of the present application.

Step 202: character recognition is performed on the first image and the template image to determine a first text box in the first image and a second text box in the template image, respectively.

Wherein at least one second text box, i.e. 1 or more second text boxes, may be included in the template image. Each second text box comprises recognized text, i.e. the text content in each second text box may consist of 1 or more characters recognized from the template image.

By carrying out character recognition on the first image, characters in the first image can be recognized, 1 or more characters with similar positions can form a text, and a text box where the text is located is the first text box. By carrying out character recognition on the second image, characters in the second image can be recognized, 1 or more characters with similar positions can form a text, and a text box where the text is located is the second text box.

The character recognition algorithm used for character recognition may be an OCR algorithm or the like, which is not limited in the embodiment of the present application.

Step 203: and based on the text content and the position of the text box, matching the first text box in the first image with the second text box in the template image, and taking the text boxes which are matched with each other as anchor boxes to obtain an anchor box pairing relation, wherein the anchor box pairing relation comprises a one-to-one pairing relation between the anchor boxes in the first image and the anchor boxes in the template image.

The text boxes matched with each other refer to text boxes with the same text content and matched positions in the text boxes, and in the embodiment of the application, the text boxes with the same text content and matched positions can be used as anchor boxes. In the embodiment of the application, under the condition that the anchor block is not set in the template image, the anchor block can be identified from the first image and the template image by combining the text content and the position of the text block.

It should be noted that, in the related art, when anchor blocks are matched only according to text content of a text box, if multiple anchor blocks with the same text content appear in a first image or a template image, a one-to-one matching relationship between anchor blocks cannot be determined. In the embodiment of the application, the matching of the text box positions is further combined on the basis of the text content matching of the text boxes, so that the matching relationship of anchor boxes between the first images or the template images can be accurately determined under the condition that a plurality of anchor boxes with the same text content appear in the first images or the template images.

The matching based on the text box position may be performed by using a point set registration method, or may be performed by using other manners, which is not limited by the embodiment of the present application.

As one example, based on the text content and the position of the text box, matching a first text box in the first image with a second text box in the template image, and using the matched text box as an anchor box, the operation of obtaining the anchor box pairing relationship includes the following steps:

1) And based on the text content of the text boxes, matching the first text boxes in the first image with the second text boxes in the template image, and taking the text boxes which are matched with each other as anchor boxes to obtain an initial anchor box matching relationship between the anchor boxes in the first image and the anchor boxes in the template image.

The text boxes matched with each other refer to text boxes with the same text content in the text boxes. The initial anchor block matching relationship may include anchor blocks that are not one-to-one matching, i.e., anchor blocks whose matching relationship is repeated may occur. For example, one anchor frame in the first image may match multiple anchor frames in the template image, i.e., one anchor frame in the first image may be the same text content as multiple anchor frames in the template image.

Therefore, after determining the initial anchor frame matching relationship, it may be determined whether the anchor frame in the first image is a one-to-one match with the anchor frame in the template image based on the initial anchor frame matching relationship.

2) If the anchor block in the first image is determined not to be matched with the anchor block in the template image one by one based on the initial anchor block matching relationship, carrying out point set registration on the anchor block in the first image and the anchor block in the template image to obtain an identification block matching relationship.

That is, if it is determined that the anchor frame in the first image and the anchor frame in the template image are not in one-to-one matching based on the initial anchor frame matching relationship, the point set registration may be further performed on the anchor frame in the first image and the anchor frame in the template image based on the position of the anchor frame, so as to obtain the one-to-one matching relationship between the anchor frame in the first image and the anchor frame in the template image.

3) If the anchor block in the first image is determined to be matched with the anchor block in the template image one by one based on the initial anchor block matching relationship, the initial anchor block matching relationship is determined to be the identification block matching relationship.

That is, if it is determined that the anchor frame in the first image and the anchor frame in the template image are matched one by one based on the initial anchor frame matching relationship, that is, it is not necessary to perform point set registration on the anchor frame in the first image and the anchor frame in the template image based on the position of the anchor frame, and the initial anchor frame matching relationship is directly determined as the identification frame matching relationship.

Step 204: and determining the number of anchor blocks matched one by one in the anchor block pairing relation.

After determining the number of anchor blocks matched one by one in the anchor block matching relationship, performing point set registration on a first text box in a first image and an identification frame in a template image based on the number of anchor blocks, and taking the first text box in the first image and the identification frame in the template image as the identification frame to obtain the identification frame matching relationship. The identification frame pairing relation comprises a one-to-one pairing relation between the identification frames in the first image and the identification frames in the template image.

As one example, the identification box pairing relationship may be determined through steps 205 and 207 described below. That is, it is firstly determined whether the number of anchor blocks matched one by one meets the number requirement, and if the number of anchor blocks does not meet the number requirement, the following step 205 is executed; if the number of anchor blocks meets the number, then the process jumps to step 207 described below.

The number requirement may be that the number of anchor blocks is greater than or equal to a preset number threshold, which may be preset, for example, the preset number threshold may be 4 or 5, etc.

Because the number of anchor frames matched one by one needs to meet a certain number requirement when the first image is subjected to perspective transformation by using the anchor frames, after the anchor frame pairing relation is determined, whether the number of anchor frames matched one by one in the anchor frame pairing relation meets the number requirement can be judged first, so that the perspective transformation can be carried out in different modes according to the judgment result.

Step 205: if the number of anchor blocks does not meet the number requirement, performing point set registration on all the first text blocks in the first image and the recognition frames in the template image, and taking the first text blocks in the first image and the recognition frames in the template image which are registered with each other as the recognition frames to obtain a recognition frame pairing relation.

In the embodiment of the application, if the number of the anchor blocks does not meet the number requirement, the anchor blocks can be treated as the condition that the anchor blocks do not exist in the first image and the template image. That is, all the first text boxes in the first image are used as candidate boxes of the identification boxes, then the identification boxes in the first image are identified by carrying out point set registration on all the first text boxes in the first image and the identification boxes in the template image, and the one-to-one pairing relation between the identification boxes in the first image and the identification boxes in the template image is determined.

Step 206: and performing perspective transformation on the first image based on the identification frame pairing relation to obtain a second image matched with the template image, and determining a structural identification result of the second image based on the identification frame correspondence relation.

It should be noted that, the specific implementation manner of the steps 205-206 may refer to the descriptions related to the steps 103-104, and the embodiments of the present application are not described herein.

Step 207: if the number of anchor blocks meets the number requirement, carrying out point set registration on the rest text blocks in the first image and the recognition frames in the template image, and taking the first text blocks in the first image and the recognition frames in the template image which are registered with each other as the recognition frames to obtain a recognition frame pairing relation.

Wherein the remaining text boxes refer to the first text box in the first image except for the anchor box.

In the embodiment of the application, if the number of the anchor blocks meets the number requirement, the anchor blocks can be treated as the condition that the anchor blocks exist in the first image and the template image. That is, the remaining text boxes except the anchor point boxes in the first image are used as candidate boxes of the identification boxes, then the identification boxes in the first image are identified by carrying out point set registration on the remaining text boxes in the first image and the identification boxes in the template image, and the one-to-one pairing relation between the identification boxes in the first image and the identification boxes in the template image is determined.

Step 208: and performing perspective transformation on the first image based on the anchor point frame pairing relation and the identification frame pairing relation to obtain a second image matched with the template image, and determining a structural identification result of the second image based on the anchor point frame pairing relation and the identification frame pairing relation.

As an example, character recognition may be performed on the second image to obtain a character recognition result, and then, based on the anchor frame pairing relationship and the recognition frame pairing relationship, the character recognition result is matched with the recognition frame in the second image to obtain a structured recognition result of the second image.

After perspective transformation is carried out on the first image to obtain a second image, the identification frame in the first image is correspondingly transformed into the identification frame in the second image, and correspondingly, the identification frame pairing relation between the first image and the template image can be also transformed into the identification frame pairing relation between the second image and the template image. Because the second image is the corrected image, the character recognition is carried out on the second image again, a more accurate character recognition result can be obtained, and the accuracy of the image recognition can be further improved. Then, according to the anchor point frame pairing relation and the identification frame pairing relation, different character identification results can be matched into corresponding identification frames in the second image, the text content of each identification frame is obtained, and then the text attribute of each identification frame is combined, so that a final structured identification result can be obtained.

It should be noted that, in the case where there are anchor blocks and identification blocks in the first image to be identified, in many actual scenarios, there may be a misrow of the anchor blocks and identification blocks in the first image. For example, a first behavior in a template image: name-Zhang III, second behavior is gender-girl; but the first line in the first image may be: name-blank, second action: sex-liqua, third behavior: age-female. That is, the identification box in the first image is delayed by one line relative to the anchor box, resulting in a misline of both. In this case, if the structured recognition result of the second image is determined directly based on the anchor frame pairing relationship and the recognition frame pairing relationship, accuracy of the recognition result may be affected.

In the embodiment of the application, after the identification frame pairing relationship is determined, the error information between the anchor frame and the identification frame in the first image can be determined based on the anchor frame pairing relationship and the identification frame pairing relationship, and then the identification frame in the first image is subjected to position correction based on the error information, so that the identification frame correction result is obtained. And then determining a structural recognition result of the second image based on the anchor point frame pairing relation, the recognition frame pairing relation and the recognition frame correction result.

Therefore, under the condition that the anchor point frame and the identification frame of the first image are in wrong rows, an accurate identification result can be determined, the image identification scene is further expanded, and the accuracy of image identification is improved.

The misline information is used for indicating the misline degree between the anchor point frame and the identification frame. The recognition frame correction result is a recognition frame after position correction, for example, coordinates of the recognition frame are corrected, and the recognition frame after coordinate correction is obtained. By performing position correction on the recognition frame in the first image, the position of the recognition frame can be corrected to a correct position corresponding to the template image.

As an example, character recognition may be performed on the second image to obtain a character recognition result, and then, based on the anchor frame pairing relationship, the recognition frame pairing relationship, and the recognition frame correction result, the character recognition result is subjected to character matching with the recognition frame in the second image to obtain a structured recognition result of the second image.

According to the anchor point frame pairing relation, the identification frame pairing relation and the identification frame correction result, the identification frame corresponding to the corrected identification frame in the second image in the template image can be determined, the character recognition result can be matched with the identification frame in the second image according to the identification frame corresponding to the template image, the text attribute of the corrected identification frame in the second image is determined, and the structural identification result of the second image is obtained.

In the embodiment of the application, for the template image with no anchor point frame and only the identification frame, the text frame identified in the first image can be used as one point set, the identification frame in the template image is used as another point set, the identification frame is identified from the text frame of the first image by carrying out point set registration on the text frame in the first image and the identification frame in the template image, the one-to-one pairing relation between the identification frame in the first image and the identification frame in the template image is determined, and then the structured identification result of the first image is determined based on the one-to-one pairing relation between the identification frames. By the image recognition mode, an anchor point frame is not required to be set in the template image in advance, so that the setting operation of the template image can be simplified, the template setting workload is reduced, the use threshold of a user is reduced, and the method has higher use value. Moreover, the matching relation of the recognition frames between the first image and the template image is determined through point set registration, and the matching relation of the recognition frames can be accurately determined under the condition that a plurality of recognition frames with the same text content appear in the first image or the template image or under the condition that anchor frames do not exist in the first image or the template image, so that a final structured recognition result is obtained, the image recognition scene is expanded, and the accuracy of image recognition is improved.

Moreover, the image recognition method provided by the embodiment of the application can be applied to various template scenes with or without anchor points, and has higher flexibility. In addition, by firstly determining the wrong line information between the anchor frame and the identification frame in the first image based on the anchor frame pairing relation and the identification frame pairing relation, then carrying out position correction on the identification frame in the first image based on the wrong line information to obtain an identification frame correction result, and then determining the structural identification result of the second image based on the anchor frame pairing relation, the identification frame pairing relation and the identification frame correction result, the accurate identification result can be obtained under the condition that the anchor frame and the identification frame of the first image are wrong, the image identification scene is further expanded, and the accuracy of image identification is improved. In addition, the scheme gives consideration to the point set matching scheme of the anchor point frame and the identification frame, and has stronger matching robustness.

Fig. 3 is a block diagram of an image recognition apparatus according to an embodiment of the present application, which is integrated into a computer device, as shown in fig. 3, and includes:

An obtaining module 301, configured to obtain a template image corresponding to the first image, where the template image includes an identification frame, and the identification frame has a corresponding text attribute;

A first recognition module 302, configured to perform character recognition on the first image to determine a first text box in the first image;

the registration module 303 is configured to perform point set registration on a first text box in the first image and an identification frame in the template image, and take the first text box in the first image and the identification frame in the template image that are registered with each other as the identification frame, so as to obtain an identification frame pairing relationship, where the identification frame pairing relationship includes a one-to-one pairing relationship between the identification frame in the first image and the identification frame in the template image;

A first determining module 304, configured to determine a structured recognition result of the first image based on the correspondence of the recognition frames, where the structured recognition result includes text content and text attributes of the recognition frames in the first image.

Optionally, the first determining module 304 is configured to:

Optionally, the apparatus further comprises:

Optionally, the matching module includes:

Optionally, the matching module further includes:

Optionally, the registration unit is configured to:

Optionally, the first determining module 304 includes:

Optionally, the third determining unit is further configured to:

It should be noted that: in the image recognition device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image recognition device and the image recognition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 4 is a block diagram of a computer device 400 according to an embodiment of the present application. The computer device 400 may be an electronic device such as a mobile phone, tablet computer, smart television, multimedia playing device, wearable device, desktop computer, server, etc. The computer apparatus 400 may be used to implement the image recognition method provided in the above-described embodiment.

In general, the computer device 400 includes: a processor 401 and a memory 402.

Processor 401 may include one or more processing cores such as a 4-core processor, an 8-core processor, etc. The processor 401 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field Programmable GATE ARRAY ), PLA (Programmable Logic Array, programmable logic array). Processor 401 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 401 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 401 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the image recognition method provided by the method embodiments of the present application.

In some embodiments, the computer device 400 may optionally further include: a peripheral interface 403 and at least one peripheral. The processor 401, memory 402, and peripheral interface 403 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 403 via buses, signal lines or a circuit board. Specifically, the peripheral device may include: at least one of a display 404, an audio circuit 405, a communication interface 406, and a power supply 407.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is not limiting of the computer device 400, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the above-described image recognition method, is also provided.

In an exemplary embodiment, a computer program product is also provided, which, when executed, is adapted to carry out the above-described image recognition method.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. An image recognition method, the method comprising:

Performing point set registration on a first text box in the first image and an identification frame in the template image, and taking the first text box which is mutually registered with the identification frame in the template image as the identification frame to obtain an identification frame pairing relation, wherein the identification frame pairing relation comprises a one-to-one pairing relation between the identification frame in the first image and the identification frame in the template image;

2. The method of claim 1, wherein the determining the structured recognition result of the first image based on the recognition frame correspondence comprises:

3. The method of claim 2, wherein the determining the structured recognition result of the second image based on the recognition frame correspondence comprises:

4. The method of claim 1, wherein prior to the point set registration of the first text box in the first image with the recognition box in the template image, further comprising:

5. The method of claim 4, wherein the matching the first text box in the first image with the second text box in the template image based on the text content and the location of the text box, and using the text boxes that are matched with each other as anchor boxes, to obtain the anchor box pairing relationship, comprises:

and if the anchor block in the first image is determined not to be matched with the anchor block in the template image one by one based on the initial anchor block matching relationship, carrying out point set registration on the anchor block in the first image and the anchor block in the template image to obtain the anchor block matching relationship.

6. The method of claim 5, wherein the method further comprises:

and if the anchor block in the first image is determined to be matched with the anchor block in the template image one by one based on the initial anchor block matching relationship, determining the initial anchor block matching relationship as the anchor block matching relationship.

7. The method of claim 4, wherein the performing point set registration of the first text box in the first image with the recognition box in the template image based on the number of anchor boxes comprises:

8. The method of claim 4, wherein the determining the structured recognition result of the first image based on the recognition frame correspondence, the structured recognition result including text content and text attributes of the recognition frame in the first image, comprises:

9. The method of claim 8, wherein prior to determining the structured recognition result for the second image based on the anchor box pairing relationship and the recognition box pairing relationship, further comprising:

10. The method of claim 9, wherein the determining the structured recognition result of the second image based on the anchor frame pairing relationship, the recognition frame pairing relationship, and the recognition frame correction result comprises:

11. An image recognition apparatus, the apparatus comprising:

The registration module is used for carrying out point set registration on a first text box in the first image and an identification frame in the template image, and taking the first text box which is mutually registered with the identification frame in the template image as the identification frame to obtain an identification frame pairing relation, wherein the identification frame pairing relation comprises a one-to-one pairing relation between the identification frame in the first image and the identification frame in the template image;

12. A computer device, the device comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to perform the steps of any of the methods of claims 1-10.

13. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the method of any of claims 1-10.