CN111160395A - Image recognition method and device, electronic equipment and storage medium - Google Patents

Image recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111160395A
CN111160395A CN201911237612.1A CN201911237612A CN111160395A CN 111160395 A CN111160395 A CN 111160395A CN 201911237612 A CN201911237612 A CN 201911237612A CN 111160395 A CN111160395 A CN 111160395A
Authority
CN
China
Prior art keywords
image
key block
key
detected
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911237612.1A
Other languages
Chinese (zh)
Inventor
周锴
王雷
宋祺
张睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201911237612.1A priority Critical patent/CN111160395A/en
Publication of CN111160395A publication Critical patent/CN111160395A/en
Priority to PCT/CN2020/134332 priority patent/WO2021110174A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Abstract

The application discloses an image recognition method, an image recognition device, an electronic device and a storage medium. The method comprises the following steps: acquiring an image to be identified; selecting a key block detection model matched with the type of the image to be recognized, and performing key block detection on the image to be recognized according to the selected key block detection model; if a plurality of key blocks are detected, clustering the plurality of detected key blocks, and segmenting a plurality of subgraphs from the image to be identified according to a clustering result to enable each subgraph to respectively comprise a plurality of key blocks; and respectively carrying out character recognition on each subgraph. The key point lies in that two strong association steps of key block detection and intelligent image segmentation based on the detected key block are utilized, the difficult problems of certificate with a fixed format and formatted output of document image content are solved, and the development labor and time cost are greatly reduced.

Description

Image recognition method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image recognition, and in particular, to an image recognition method, an image recognition apparatus, an electronic device, and a storage medium.
Background
Image recognition is widely applied in the fields of identity authentication, word processing and the like, and an important application scenario is that identification is required to be performed on licenses, identity cards and other licenses so as to verify identity or qualification.
The current solution, besides the manual mode which is gradually eliminated, also includes the following steps: one method is that a specific certificate is designed independently, and due to the fact that a plurality of steps such as priori information induction and service development are needed, manpower and time are consumed, and the method can be realized only by at least 2 months. The other method is to use the fixed characteristic of the license format to carry out image matching on the image to be recognized and the sample image of the corresponding format and then carry out recognition, but the method only has better performance on the ideal conditions of clearness, no deformation and the like of the image to be recognized, and once the conditions of character line drift, deformation, affine transformation and the like exist in the image to be recognized, the recognition effect is very unsatisfactory.
Disclosure of Invention
In view of the above, the present application is made to provide an image recognition method, apparatus, electronic device, and storage medium that overcome or at least partially solve the above-mentioned problems.
According to an aspect of the present application, there is provided an image recognition method including: acquiring an image to be identified; selecting a key block detection model matched with the type of the image to be recognized, and performing key block detection on the image to be recognized according to the selected key block detection model; if a plurality of key blocks are detected, clustering the plurality of detected key blocks, and segmenting a plurality of subgraphs from the image to be identified according to a clustering result to enable each subgraph to respectively comprise a plurality of key blocks; and respectively carrying out character recognition on each subgraph.
Optionally, the method further comprises: and if the key block cannot be detected, judging that the image to be identified does not accord with the category.
Optionally, the key block detection model is obtained by training as follows: acquiring a sample image of an appointed category as training data, wherein the sample image is marked with a plurality of key blocks; performing iterative training by using the training data to obtain a key block detection model matched with the specified category; wherein the key block detection model is implemented based on a target detection algorithm.
Optionally, the clustering the detected multiple key blocks includes: clustering is carried out based on the vector representation of the key blocks, and the clustering result meets the following conditions: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than a first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than a second threshold.
Optionally, the vector representation comprises: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
Optionally, the performing character recognition on each sub-image respectively includes: respectively carrying out character line detection on each subgraph to obtain detected character lines; and matching the detected character lines with the key blocks, and determining the attributes of the matched character lines according to the attributes of the key blocks.
Optionally, the performing character recognition on each sub-image further includes: and identifying the character content of the detected character line.
According to another aspect of the present application, there is provided an image recognition apparatus including: the image acquisition unit is used for acquiring an image to be identified; the key block detection unit is used for selecting a key block detection model matched with the type of the image to be recognized and detecting the key block of the image to be recognized according to the selected key block detection model; the clustering unit is used for clustering the detected key blocks if the key blocks are detected, and segmenting a plurality of subgraphs from the image to be identified according to a clustering result to ensure that each subgraph comprises a plurality of key blocks respectively; and the recognition unit is used for respectively carrying out character recognition on each subgraph.
Optionally, the identifying unit is further configured to determine that the image to be identified does not match the category if the key block cannot be detected.
Optionally, the key block detection model is obtained by training as follows: acquiring a sample image of an appointed category as training data, wherein the sample image is marked with a plurality of key blocks; performing iterative training by using the training data to obtain a key block detection model matched with the specified category; wherein the key block detection model is implemented based on a target detection algorithm.
Optionally, the clustering unit is configured to perform clustering based on the vector representation of the key block, where the clustering result satisfies the following condition: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than a first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than a second threshold.
Optionally, the vector representation comprises: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
Optionally, the identification unit is configured to perform text line detection on each sub-image, respectively, to obtain a detected text line; and matching the detected character lines with the key blocks, and determining the attributes of the matched character lines according to the attributes of the key blocks.
Optionally, the identification unit is configured to perform text content identification on the detected text line.
In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.
According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.
According to the technical scheme, after the image to be recognized is obtained, the matched key block detection model is selected according to the category of the image to be recognized, then the detected key blocks are clustered, a plurality of sub-images are segmented from the image to be recognized according to the clustering result, each sub-image comprises a plurality of key blocks, therefore, the intelligent segmentation of the image is completed, and finally, character recognition is carried out on each sub-image. The key point lies in that two strong association steps of key block detection and intelligent image segmentation based on the detected key block are utilized, the difficult problems of certificate with a fixed format and formatted output of document image content are solved, and the development labor and time cost are greatly reduced.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a schematic flow diagram of an image recognition method according to an embodiment of the present application;
FIG. 2 illustrates a plurality of key blocks detected in a food run license image;
FIG. 3 shows an invoice image containing smaller textual content;
FIG. 4 is a schematic diagram illustrating the sub-graph segmentation of FIG. 2;
FIG. 5 shows the result of line detection for the left half of the sub-graph of FIG. 4;
FIG. 6 is a schematic diagram of an image recognition apparatus according to an embodiment of the present application;
FIG. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 8 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The design idea of the application lies in: selecting a corresponding key block detection model according to the type (such as an identity card and a business license … …) of the image to be recognized by utilizing the characteristic of fixed license format, and determining a key block from the image to be recognized; and clustering the key blocks, segmenting the image to be recognized according to the clustering result, and properly amplifying the segmented image so as to enable the result of character recognition on the segmented image to be more accurate.
The method has the advantages that through the two strong association steps of key block detection and intelligent image segmentation based on the detected key block, the inherent idea that a key area is not required to be determined or an image is not required to be segmented when the character recognition is directly carried out on the image in the prior art is broken, the unexpected effect is achieved, the recognition accuracy and the recall rate are remarkably improved compared with the existing scheme, the method is suitable for practicality, the recognition development of a new format can be realized only by about 3 people, and the resource cost is greatly reduced.
The technical scheme of the application can be applied to the scene of identifying the image of the license with a fixed format, including but not limited to identity verification, qualification verification and the like, and can be applied to the business fields of take-out, financial service and the like. The following detailed description is given with reference to various embodiments.
Fig. 1 shows a schematic flow diagram of an image recognition method according to an embodiment of the present application. As shown in fig. 1, the method includes:
and step S110, acquiring an image to be identified. The image to be recognized can be an image uploaded by a user, and has a broad understanding, for example, a photo, a screenshot and a video frame extracted from a video all belong to the category of the image.
For the content and uploading scenario of the image to be recognized, for example, before purchasing a financial product, the user may be required to upload an identification card photo to verify the identity information of the user; at the time of registration of the take-away merchant, a photograph of a license is required, and so on.
The image to be recognized needs to have a category information, which is not necessarily accurate, for example, a user may upload a driver license photo in a scene where the driver license photo is requested to be uploaded, but such a photo is obviously inconsistent with the category information.
Step S120, selecting a key block detection model matched with the type of the image to be recognized, and performing key block detection on the image to be recognized according to the selected key block detection model.
For example, for an identification card recognition scene, a key block detection model matched with the identification card is selected; for the invoice identification scenario, a key blob detection model matching the invoice is selected. The key block detection model can be obtained through deep learning training. Preferably, before the image to be recognized is sent to the key block detection model, preprocessing can be performed, such as image segmentation, and parts irrelevant to the certificate and the document are cut off; beautifying and correcting the image to make the characters clearer and the shape of the license closer to an ideal state; the direction of the image to be recognized is adjusted first to improve accuracy, for example, if the driver license picture taken by the user is upside down, the driver license picture can be detected after being rotated 180 degrees, and the like.
The main purpose of the key block detection is to determine the position and the attribute of the key block, and the determined key block may be obtained by a detection method (the detected key block is marked by a bounding box) or a segmentation method (the detected key block is marked by a mask).
Step S130, if a plurality of key blocks are detected, clustering is carried out on the detected key blocks, and a plurality of subgraphs are segmented from the image to be identified according to the clustering result, so that each subgraph respectively comprises a plurality of key blocks.
The detected key blocks may contain attributes. Fig. 2 illustrates a plurality of key blocks detected in a food service license image. As can be seen from fig. 2, the detected key blocks respectively correspond to the operator name (key _0), the operator name (content key _0_ content), the social credit code (key _1), the social credit code (content key _1_ content) … …, which may be stored as corresponding keys and contents, respectively, such as key _2, key _2_ content, key _3_ content, and so on, when stored or calculated. It can be seen that the key block actually corresponds to a text block, such as an information area inevitably included in the fixed-format image.
For the image as shown in fig. 2, a text line detection scheme is provided in the prior art, but the prior art has a disadvantage that, because the existing text line detection algorithm can directly detect the text line from the image as shown in fig. 2, the image itself is not processed, for example, the key block detection and the sub-image segmentation based on the key block detection as shown in the embodiment of the present application, which is considered to be meaningless by the prior art.
However, in the present application, for the situation that the text area is small relative to the size of the whole image, such as in fig. 3, the text line detection algorithm based on the neural network such as the convolutional neural network in the prior art cannot pay attention to the features of the small text (the small text is excessively compressed during convolution, and effective features cannot be extracted), so that detection omission occurs. Therefore, the method and the device provide two strongly-associated steps of detecting the key block and intelligently segmenting the image based on the detected key block, obtain the segmented sub-image at first, and then recognize the character line of the sub-image, so that the recognition accuracy and the recall rate are greatly improved, and the sub-image can be properly amplified, so that the character features are more obvious.
That is, the technical solution of the present application overcomes the technical bias in the prior art, thereby achieving unexpected effects, and obtaining good recognition results for the image as shown in fig. 3.
Step S140, character recognition is performed on each sub-image. The text recognition here can be implemented by using the prior art, and the present application does not limit this.
It can be seen that, in the method shown in fig. 1, after the image to be recognized is obtained, the matched key block detection model is selected according to the category of the image to be recognized, then the detected key blocks are clustered, a plurality of sub-images are segmented from the image to be recognized according to the clustering result, each sub-image respectively comprises a plurality of key blocks, so that the intelligent segmentation of the image is completed, and finally, character recognition is performed on each sub-image respectively. The key point lies in that two strong association steps of key block detection and intelligent image segmentation based on the detected key block are utilized, the difficult problems of certificate with a fixed format and formatted output of document image content are solved, and the development labor and time cost are greatly reduced.
In an embodiment of the present application, the method further includes: if the key block can not be detected, judging that the image to be identified does not accord with the type.
For example, in a driving license recognition scenario, if the user mistransmits a driving license, it is difficult to detect a key block according to the key block detection model of the driving license, and it may be determined that the image is wrong, specifically, the image to be recognized may not match the category. The embodiment can have better practicability in a service scene, and for example, the prompt that the image is wrong and the image is required to be uploaded again can be based on the prompt.
This approach also has the advantage over the prior art that key block detection is more granular than word line identification. Therefore, the efficiency is better, and if the image does not contain the key blocks, the possibility of not containing the effective information of the corresponding category is very high, and the image to be recognized can be found out to be inconsistent with the category more quickly at lower cost.
In an embodiment of the present application, in the method, the key block detection model is obtained by training as follows: acquiring a sample image of an appointed category as training data, wherein the sample image is marked with a plurality of key blocks; performing iterative training by using the training data to obtain a key block detection model matched with the specified category; the key block detection model is realized based on a target detection algorithm.
The network architecture of the key block detection model can be realized by referring to the prior art, in other words, it is also feasible to directly use the existing target detection framework, and only training is needed to be performed based on labeled training data to obtain the key block detection model matched with the category. A general basic target detection framework can be set up, and different training is carried out based on different training data to obtain different key block detection models.
The performance of the target detection framework is sufficient due to the larger granularity of the key blocks, but the requirement of directly positioning the character lines according to the target detection framework is difficult to satisfy, so that the design after the balance is only to use the target detection framework to detect the key blocks, but not to detect the character lines. In other words, the detected key block may include a plurality of text lines, and the key block only needs the position and the attribute without separating the text lines.
In an embodiment of the application, the clustering the detected key blocks includes: clustering is carried out based on the vector representation of the key blocks, and the clustering result meets the following conditions: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than a first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than a second threshold.
Specifically, in an embodiment of the present application, in the above method, the vector representation includes: the coordinates of the center point of the key block, the width of the key block, and the height of the key block. For example, the vector of a key block is represented as (x, y, w, h), where x and y are the horizontal and vertical coordinates of the center point of the key block, w is the width of the key block, and h is the height of the key block.
In order to avoid over-segmentation of the subgraph or too coarse segmentation granularity, the embodiment of the application uses an evaluation function for control, and the specific evaluation function may be the following function:
Si/S≤threshold1and Sboxi/Si≥threshold2
wherein the area of each subgraph is represented as Si(0≤i<k) The sum of the areas of the text boxes contained in each subgraph is represented as Sboxi(0≤i<k) The area of the image to be identified is S, threshold1And threshold2First and second thresholds, respectively, which may be equal.
Specifically, clustering and subgraph segmentation may be performed dynamically, for example, k is initialized to 1, then whether the subgraph obtained by segmentation meets the evaluation function is determined according to the evaluation function, if not, k is increased by one, and clustering and subgraph segmentation are performed until the evaluation function is met.
The result of the segmentation of fig. 2 is shown in fig. 4, in which two sub-graphs obtained by the segmentation are outlined by dashed lines.
In an embodiment of the present application, in the method, performing character recognition on each sub-image includes: respectively carrying out character line detection on each subgraph to obtain detected character lines; and matching the detected character lines with the key blocks, and determining the attributes of the matched character lines according to the attributes of the key blocks.
Because the image to be recognized may have rotation (the rotation here refers to rotation of a small angle, for example, there may be a small angle included angle between a perpendicular bisector of a document in the image and a perpendicular bisector of the image due to a shooting angle when the image is shot), affine change (for example, a rectangular document is shot like an inclined parallelogram), blur, and the like, the effect of directly performing line segmentation detection on each key block in the sub-image and then recognizing the key block is not ideal. In contrast, the present application proposes a manner of dividing the text lines, that is, firstly, performing text line detection on the sub-graph to obtain a plurality of text lines; and matching the character line with the key block, so that the attribute of the matched character line can be determined according to the attribute of the key block.
Specific matching can be realized by using the idea of iou (interaction over union), and IoU is also a common concept in target detection, and generally refers to the overlapping ratio of the generated candidate frame (candidate frame) and the original labeled frame (group frame), i.e. the ratio of their intersection to union, also called as the intersection ratio. In the embodiment of the present application, if IoU of a detected text line and a key tile is greater than a predetermined threshold, the two are considered to match.
For example, fig. 5 shows the result of detecting the lines (shown by white boxes) in the left half sub-image of fig. 4, wherein the attribute of the line of "tianjin civic sierra xx cold foodservice" is the content item of "operator name".
The word line detection algorithm can be implemented by using the existing technologies, including but not limited to CTPN algorithm (no unified chinese name exists at present), seg-link algorithm (no unified chinese name exists at present), and the like.
In an embodiment of the application, in the method, performing character recognition on each sub-image further includes: and identifying the character content of the detected character line. The text herein may also be implemented using existing technology, which is not limited by this application.
The recognized text content can be applied to corresponding scenes, for example, the user only needs to provide images of a business license, does not need to manually fill information of legal representatives and the like, and only needs to directly use the text content recognition result.
Fig. 6 shows a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application. As shown in fig. 6, the image recognition apparatus 600 includes:
an image obtaining unit 610, configured to obtain an image to be identified. The image to be recognized can be an image uploaded by a user, and has a broad understanding, for example, a photo, a screenshot and a video frame extracted from a video all belong to the category of the image.
For the content and uploading scenario of the image to be recognized, for example, before purchasing a financial product, the user may be required to upload an identification card photo to verify the identity information of the user; at the time of registration of the take-away merchant, a photograph of a license is required, and so on.
The image to be recognized needs to have a category information, which is not necessarily accurate, for example, a user may upload a driver license photo in a scene where the driver license photo is requested to be uploaded, but such a photo is obviously inconsistent with the category information.
The key block detection unit 620 is configured to select a key block detection model matching the category of the image to be recognized, and perform key block detection on the image to be recognized according to the selected key block detection model.
For example, for an identification card recognition scene, a key block detection model matched with the identification card is selected; for the invoice identification scenario, a key blob detection model matching the invoice is selected. The key block detection model can be obtained through deep learning training. Preferably, before the image to be recognized is sent to the key block detection model, preprocessing can be performed, such as image segmentation, and parts irrelevant to the certificate and the document are cut off; beautifying and correcting the image to make the characters clearer and the shape of the license closer to an ideal state; the direction of the image to be recognized is adjusted first to improve accuracy, for example, if the driver license picture taken by the user is upside down, the driver license picture can be detected after being rotated 180 degrees, and the like.
The main purpose of the key block detection is to determine the position and the attribute of the key block, and the determined key block may be obtained by a detection method (the detected key block is marked by a bounding box) or a segmentation method (the detected key block is marked by a mask).
And the clustering unit 630 is configured to cluster the detected key blocks if the key blocks are detected, and segment a plurality of subgraphs from the image to be identified according to a clustering result, so that each subgraph includes a plurality of key blocks.
The detected key blocks may contain attributes. Fig. 2 illustrates a plurality of key blocks detected in a food service license image. As can be seen from fig. 2, the detected key blocks respectively correspond to the operator name (key _0), the operator name (content key _0_ content), the social credit code (key _1), the social credit code (content key _1_ content) … …, which may be stored as corresponding keys and contents, respectively, such as key _2, key _2_ content, key _3_ content, and so on, when stored or calculated. It can be seen that the key block actually corresponds to a text block, such as an information area inevitably included in the fixed-format image.
For the image as shown in fig. 2, a text line detection scheme is provided in the prior art, but the prior art has a disadvantage that, because the existing text line detection algorithm can directly detect the text line from the image as shown in fig. 2, the image itself is not processed, for example, the key block detection and the sub-image segmentation based on the key block detection as shown in the embodiment of the present application, which is considered to be meaningless by the prior art.
However, in the present application, for the situation that the text area is small relative to the size of the whole image, such as in fig. 3, the text line detection algorithm based on the neural network such as the convolutional neural network in the prior art cannot pay attention to the features of the small text (the small text is excessively compressed during convolution, and effective features cannot be extracted), so that detection omission occurs. Therefore, the method and the device provide two strongly-associated steps of detecting the key block and intelligently segmenting the image based on the detected key block, obtain the segmented sub-image at first, and then recognize the character line of the sub-image, so that the recognition accuracy and the recall rate are greatly improved, and the sub-image can be properly amplified, so that the character features are more obvious.
That is, the technical solution of the present application overcomes the technical bias in the prior art, thereby achieving unexpected effects, and obtaining good recognition results for the image as shown in fig. 3.
And the recognition unit 640 is used for respectively performing character recognition on each subgraph. The text recognition here can be implemented by using the prior art, and the present application does not limit this.
It can be seen that, in the apparatus shown in fig. 6, after the image to be recognized is obtained, the matched key block detection model is selected according to the category of the image, the detected key blocks are clustered, a plurality of subgraphs are segmented from the image to be recognized according to the clustering result, each subgraph includes a plurality of key blocks, so that the intelligent segmentation of the image is completed, and finally, character recognition is performed on each subgraph. The key point lies in that two strong association steps of key block detection and intelligent image segmentation based on the detected key block are utilized, the difficult problems of certificate with a fixed format and formatted output of document image content are solved, and the development labor and time cost are greatly reduced.
In an embodiment of the application, in the image recognition apparatus 600, the recognition unit 640 is further configured to determine that the image to be recognized does not conform to the category if the key block cannot be detected.
For example, in a driving license recognition scenario, if the user mistransmits a driving license, it is difficult to detect a key block according to the key block detection model of the driving license, and it may be determined that the image is wrong, specifically, the image to be recognized may not match the category. The embodiment can have better practicability in a service scene, and for example, the prompt that the image is wrong and the image is required to be uploaded again can be based on the prompt.
This approach also has the advantage over the prior art that key block detection is more granular than word line identification. Therefore, the efficiency is better, and if the image does not contain the key blocks, the possibility of not containing the effective information of the corresponding category is very high, and the image to be recognized can be found out to be inconsistent with the category more quickly at lower cost.
In an embodiment of the present application, in the image recognition apparatus 600, the key block detection model is obtained by training as follows: acquiring a sample image of an appointed category as training data, wherein the sample image is marked with a plurality of key blocks; performing iterative training by using the training data to obtain a key block detection model matched with the specified category; the key block detection model is realized based on a target detection algorithm.
The network architecture of the key block detection model can be realized by referring to the prior art, in other words, it is also feasible to directly use the existing target detection framework, and only training is needed to be performed based on labeled training data to obtain the key block detection model matched with the category. A general basic target detection framework can be set up, and different training is carried out based on different training data to obtain different key block detection models.
The performance of the target detection framework is sufficient due to the larger granularity of the key blocks, but the requirement of directly positioning the character lines according to the target detection framework is difficult to satisfy, so that the design after the balance is only to use the target detection framework to detect the key blocks, but not to detect the character lines. In other words, the detected key block may include a plurality of text lines, and the key block only needs the position and the attribute without separating the text lines.
In an embodiment of the present application, in the image recognition apparatus 600, the clustering unit 630 is configured to perform clustering based on vector representation of the key block, where a clustering result satisfies the following condition: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than a first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than a second threshold.
Specifically, in an embodiment of the present application, in the image recognition apparatus 600, the vector representation includes: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
For example, the vector of a key block is represented as (x, y, w, h), where x and y are the horizontal and vertical coordinates of the center point of the key block, w is the width of the key block, and h is the height of the key block.
In order to avoid over-segmentation of the subgraph or too coarse segmentation granularity, the embodiment of the application uses an evaluation function for control, and the specific evaluation function may be the following function:
Si/S≤threshold1and Sboxi/Si≥threshold2
wherein the area of each subgraph is represented as Si(0≤i<k) The sum of the areas of the text boxes contained in each subgraph is represented as Sboxi(0≤i<k) The area of the image to be identified is S, threshold1And threshold2First and second thresholds, respectively, which may be equal.
Specifically, clustering and subgraph segmentation may be performed dynamically, for example, k is initialized to 1, then whether the subgraph obtained by segmentation meets the evaluation function is determined according to the evaluation function, if not, k is increased by one, and clustering and subgraph segmentation are performed until the evaluation function is met.
The result of the segmentation of fig. 2 is shown in fig. 4, in which two sub-graphs obtained by the segmentation are outlined by dashed lines.
In an embodiment of the present application, in the image recognition apparatus 600, the recognition unit 640 is configured to perform text line detection on each sub-image to obtain a detected text line; and matching the detected character lines with the key blocks, and determining the attributes of the matched character lines according to the attributes of the key blocks.
Because the image to be recognized may have rotation (the rotation here refers to rotation of a small angle, for example, there may be a small angle included angle between a perpendicular bisector of a document in the image and a perpendicular bisector of the image due to a shooting angle when the image is shot), affine change (for example, a rectangular document is shot like an inclined parallelogram), blur, and the like, the effect of directly performing line segmentation detection on each key block in the sub-image and then recognizing the key block is not ideal. In contrast, the present application proposes a manner of dividing the text lines, that is, firstly, performing text line detection on the sub-graph to obtain a plurality of text lines; and matching the character line with the key block, so that the attribute of the matched character line can be determined according to the attribute of the key block.
Specific matching can be realized by using the idea of iou (interaction over union), and IoU is also a common concept in target detection, and generally refers to the overlapping ratio of the generated candidate frame (candidate frame) and the original labeled frame (group frame), i.e. the ratio of their intersection to union, also called as the intersection ratio. In the embodiment of the present application, if IoU of a detected text line and a key tile is greater than a predetermined threshold, the two are considered to match.
For example, fig. 5 shows the result of detecting the lines (shown by white boxes) in the left half sub-image of fig. 4, wherein the attribute of the line of "tianjin civic sierra xx cold foodservice" is the content item of "operator name".
The word line detection algorithm can be implemented by using the existing technologies, including but not limited to CTPN algorithm (no unified chinese name exists at present), seg-link algorithm (no unified chinese name exists at present), and the like.
In an embodiment of the present application, in the image recognition apparatus 600, the recognition unit 640 is configured to perform character content recognition on the detected character line. The text herein may also be implemented using existing technology, which is not limited by this application.
The recognized text content can be applied to corresponding scenes, for example, the user only needs to provide images of a business license, does not need to manually fill information of legal representatives and the like, and only needs to directly use the text content recognition result.
In summary, according to the technical scheme of the application, after the image to be recognized is obtained, the matched key block detection model is selected according to the category of the image to be recognized, then the detected key blocks are clustered, a plurality of sub-images are segmented from the image to be recognized according to the clustering result, each sub-image respectively comprises a plurality of key blocks, so that the intelligent segmentation of the image is completed, and finally, character recognition is respectively carried out on each sub-image. The key point lies in that two strong association steps of key block detection and intelligent image segmentation based on the detected key block are utilized, the difficult problems of certificate with a fixed format and formatted output of document image content are solved, and the development labor and time cost are greatly reduced.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an image recognition apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 700 comprises a processor 710 and a memory 720 arranged to store computer executable instructions (computer readable program code). The memory 720 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 720 has a storage space 730 storing computer readable program code 731 for performing any of the method steps described above. For example, the storage space 730 for storing the computer readable program code may comprise respective computer readable program codes 731 for respectively implementing various steps in the above method. The computer readable program code 731 can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 8. FIG. 8 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 800 stores computer readable program code 731 for performing the method steps according to the present application, readable by the processor 710 of the electronic device 700, which computer readable program code 731, when executed by the electronic device 700, causes the electronic device 700 to perform the steps of the method described above, in particular the computer readable program code 731 stored by the computer readable storage medium performs the method shown in any of the embodiments described above. The computer readable program code 731 may be compressed in a suitable form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. An image recognition method, comprising:
acquiring an image to be identified;
selecting a key block detection model matched with the type of the image to be recognized, and performing key block detection on the image to be recognized according to the selected key block detection model;
if a plurality of key blocks are detected, clustering the plurality of detected key blocks, and segmenting a plurality of subgraphs from the image to be identified according to a clustering result to enable each subgraph to respectively comprise a plurality of key blocks;
and respectively carrying out character recognition on each subgraph.
2. The method of claim 1, wherein the method further comprises:
and if the key block cannot be detected, judging that the image to be identified does not accord with the category.
3. The method of claim 1, wherein the key block detection model is trained by:
acquiring a sample image of an appointed category as training data, wherein the sample image is marked with a plurality of key blocks;
performing iterative training by using the training data to obtain a key block detection model matched with the specified category; wherein the key block detection model is implemented based on a target detection algorithm.
4. The method of claim 1, wherein clustering the detected plurality of key blocks comprises:
clustering is carried out based on the vector representation of the key blocks, and the clustering result meets the following conditions:
the ratio of the area of each sub-image to the area of the image to be recognized is not greater than a first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than a second threshold.
5. The method of claim 4, wherein the vector representation comprises: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
6. The method of claim 1, wherein the separately performing character recognition on each sub-graph comprises:
respectively carrying out character line detection on each subgraph to obtain detected character lines;
and matching the detected character lines with the key blocks, and determining the attributes of the matched character lines according to the attributes of the key blocks.
7. The method of claim 6, wherein the separately performing character recognition on each sub-graph further comprises:
and identifying the character content of the detected character line.
8. An image recognition apparatus comprising:
the image acquisition unit is used for acquiring an image to be identified;
the key block detection unit is used for selecting a key block detection model matched with the type of the image to be recognized and detecting the key block of the image to be recognized according to the selected key block detection model;
the clustering unit is used for clustering the detected key blocks if the key blocks are detected, and segmenting a plurality of subgraphs from the image to be identified according to a clustering result to ensure that each subgraph comprises a plurality of key blocks respectively;
and the recognition unit is used for respectively carrying out character recognition on each subgraph.
9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.
CN201911237612.1A 2019-12-05 2019-12-05 Image recognition method and device, electronic equipment and storage medium Pending CN111160395A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911237612.1A CN111160395A (en) 2019-12-05 2019-12-05 Image recognition method and device, electronic equipment and storage medium
PCT/CN2020/134332 WO2021110174A1 (en) 2019-12-05 2020-12-07 Image recognition method and device, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911237612.1A CN111160395A (en) 2019-12-05 2019-12-05 Image recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111160395A true CN111160395A (en) 2020-05-15

Family

ID=70556519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911237612.1A Pending CN111160395A (en) 2019-12-05 2019-12-05 Image recognition method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111160395A (en)
WO (1) WO2021110174A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308046A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Method, device, server and readable storage medium for positioning text region of image
CN112597773A (en) * 2020-12-08 2021-04-02 上海深杳智能科技有限公司 Document structuring method, system, terminal and medium
CN112686237A (en) * 2020-12-21 2021-04-20 福建新大陆软件工程有限公司 Certificate OCR recognition method
WO2021110174A1 (en) * 2019-12-05 2021-06-10 北京三快在线科技有限公司 Image recognition method and device, electronic device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743361A (en) * 2021-09-16 2021-12-03 上海深杳智能科技有限公司 Document cutting method based on image target detection

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160307057A1 (en) * 2015-04-20 2016-10-20 3M Innovative Properties Company Fully Automatic Tattoo Image Processing And Retrieval
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice
CN108133212A (en) * 2018-01-05 2018-06-08 东华大学 A kind of quota invoice amount identifying system based on deep learning
CN108171239A (en) * 2018-02-02 2018-06-15 杭州清本科技有限公司 The extracting method of certificate pictograph, apparatus and system, computer storage media
CN108520254A (en) * 2018-03-01 2018-09-11 腾讯科技(深圳)有限公司 A kind of Method for text detection, device and relevant device based on formatted image
CN108776970A (en) * 2018-06-12 2018-11-09 北京字节跳动网络技术有限公司 Image processing method and device
CN109840520A (en) * 2017-11-24 2019-06-04 中国移动通信集团广东有限公司 A kind of invoice key message recognition methods and system
RU2691214C1 (en) * 2017-12-13 2019-06-11 Общество с ограниченной ответственностью "Аби Продакшн" Text recognition using artificial intelligence
CN110472554A (en) * 2019-08-12 2019-11-19 南京邮电大学 Table tennis action identification method and system based on posture segmentation and crucial point feature

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034159B (en) * 2018-05-28 2021-05-28 北京捷通华声科技股份有限公司 Image information extraction method and device
CN109492643B (en) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 Certificate identification method and device based on OCR, computer equipment and storage medium
CN111160395A (en) * 2019-12-05 2020-05-15 北京三快在线科技有限公司 Image recognition method and device, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160307057A1 (en) * 2015-04-20 2016-10-20 3M Innovative Properties Company Fully Automatic Tattoo Image Processing And Retrieval
CN109840520A (en) * 2017-11-24 2019-06-04 中国移动通信集团广东有限公司 A kind of invoice key message recognition methods and system
RU2691214C1 (en) * 2017-12-13 2019-06-11 Общество с ограниченной ответственностью "Аби Продакшн" Text recognition using artificial intelligence
CN107977665A (en) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 The recognition methods of key message and computing device in a kind of invoice
CN108133212A (en) * 2018-01-05 2018-06-08 东华大学 A kind of quota invoice amount identifying system based on deep learning
CN108171239A (en) * 2018-02-02 2018-06-15 杭州清本科技有限公司 The extracting method of certificate pictograph, apparatus and system, computer storage media
CN108520254A (en) * 2018-03-01 2018-09-11 腾讯科技(深圳)有限公司 A kind of Method for text detection, device and relevant device based on formatted image
CN108776970A (en) * 2018-06-12 2018-11-09 北京字节跳动网络技术有限公司 Image processing method and device
CN110472554A (en) * 2019-08-12 2019-11-19 南京邮电大学 Table tennis action identification method and system based on posture segmentation and crucial point feature

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021110174A1 (en) * 2019-12-05 2021-06-10 北京三快在线科技有限公司 Image recognition method and device, electronic device, and storage medium
CN112308046A (en) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 Method, device, server and readable storage medium for positioning text region of image
CN112597773A (en) * 2020-12-08 2021-04-02 上海深杳智能科技有限公司 Document structuring method, system, terminal and medium
CN112597773B (en) * 2020-12-08 2022-12-13 上海深杳智能科技有限公司 Document structuring method, system, terminal and medium
CN112686237A (en) * 2020-12-21 2021-04-20 福建新大陆软件工程有限公司 Certificate OCR recognition method

Also Published As

Publication number Publication date
WO2021110174A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
US10885644B2 (en) Detecting specified image identifiers on objects
CN111160395A (en) Image recognition method and device, electronic equipment and storage medium
CN110569878B (en) Photograph background similarity clustering method based on convolutional neural network and computer
CN110097068B (en) Similar vehicle identification method and device
US9754192B2 (en) Object detection utilizing geometric information fused with image data
JP2018198053A (en) Information processor, information processing method, and program
CN112612911A (en) Image processing method, system, device and medium, and program product
CN101689300A (en) Image segmentation and enhancement
CN109255300B (en) Bill information extraction method, bill information extraction device, computer equipment and storage medium
CN111191611A (en) Deep learning-based traffic sign label identification method
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN113158895B (en) Bill identification method and device, electronic equipment and storage medium
CA3162655A1 (en) Image processing based methods and apparatus for planogram compliance
CN111753592A (en) Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium
CN112989921A (en) Target image information identification method and device
CN109508716B (en) Image character positioning method and device
CN114155363A (en) Converter station vehicle identification method and device, computer equipment and storage medium
CN115035533B (en) Data authentication processing method and device, computer equipment and storage medium
CN111178200A (en) Identification method of instrument panel indicator lamp and computing equipment
CN109087439B (en) Bill checking method, terminal device, storage medium and electronic device
CN111401415A (en) Training method, device, equipment and storage medium of computer vision task model
CN116363655A (en) Financial bill identification method and system
CN111680691B (en) Text detection method, text detection device, electronic equipment and computer readable storage medium
CN114842198A (en) Intelligent loss assessment method, device and equipment for vehicle and storage medium
CN114092684A (en) Text calibration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200515