CN110674811A

CN110674811A - Image recognition method and device

Info

Publication number: CN110674811A
Application number: CN201910831740.2A
Authority: CN
Inventors: 刘学文
Original assignee: Guangdong Inspur Big Data Research Co Ltd
Current assignee: Guangdong Inspur Smart Computing Technology Co Ltd
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2020-01-10
Anticipated expiration: 2039-09-04
Also published as: CN110674811B

Abstract

The invention provides an image recognition method and device, wherein the method comprises the following steps: obtaining an image to be identified; the image to be recognized is displayed with a line of text to be recognized; processing the image to be recognized by using a target detection algorithm model to obtain the position information of each character forming the text to be recognized; segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images; wherein each sub-image displays one character; processing each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image; and arranging each character obtained by identification according to the position information of the character to obtain the identification result of the image to be identified. The method achieves the purpose of improving the accuracy of character segmentation in the graph, thereby improving the accuracy of character recognition.

Description

Image recognition method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for image recognition.

Background

Character recognition in images has a wide demand in many fields, and related applications relate to identification card recognition, license plate number recognition, express bill recognition, bank card number recognition and the like. The character recognition in the image usually needs to divide each character from the image, and the traditional method is to divide the characters in the image horizontally, vertically project each divided row, find the left and right boundaries of each character, divide the single character, and then recognize the characters by using the designed model on the divided character image.

However, due to the close arrangement of the possible Chinese characters and the existence of many characters with left and right radical structures in the Chinese characters, excessive segmentation is easily caused if vertical segmentation is performed after vertical projection is used. Therefore, the accuracy of character recognition is not high in the subsequent character recognition process by using the designed model.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image recognition method and apparatus, which are used to improve the accuracy of segmenting characters in a graph, so as to improve the accuracy of character recognition.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

a method of image recognition, comprising:

acquiring an image to be identified; the image to be recognized is displayed with a line of text to be recognized;

processing the image to be recognized by using a target detection algorithm model to obtain the position information of each character forming the text to be recognized;

segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images; wherein each sub-image displays one character;

processing each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image;

and arranging each character obtained by identification according to the position information of the character to obtain the identification result of the image to be identified.

Optionally, before the processing the image to be recognized by using the target detection algorithm model to obtain the position information of each character forming the text to be recognized, the method further includes:

judging whether the text to be recognized displayed by the image to be recognized is a plurality of lines;

and if the text to be recognized displayed by the image to be recognized is judged to be a plurality of lines, finding the upper limit and the lower limit of each line, and horizontally cutting to obtain a plurality of sub-texts to be recognized.

Optionally, the processing the image to be recognized by using the target detection algorithm model to obtain the position information of each character forming the text to be recognized includes:

judging whether the size of the text to be recognized accords with a preset size or not;

if the size of the text to be recognized is judged to be not in accordance with the preset size, changing the size of the text to be recognized into the preset size;

recording the position (xmin, ymin, xmax, ymax) of each Chinese character in the text to be recognized after the text to be recognized is changed to the preset size; wherein, (xmin, ymin) and (xmax, ymax) are the coordinates of the upper left corner and the lower right corner of the Chinese character, respectively.

Optionally, after the recording changes the text to be recognized to the preset size and positions (xmin, ymin, xmax, ymax) of each chinese character in the text to be recognized, the method further includes:

identifying the size of each Chinese character in the text to be identified by using a preset anchor, and confirming the size of each Chinese character in the text to be identified; wherein the preset anchors are of sizes (10,10), (20,20), (30,30), (40,40), (50,50) and (60, 60).

Optionally, before the processing of each sub-image by using the character recognition-convolution neural network model to recognize the characters in each sub-image, the method further includes:

and adjusting the size of each sub-image according to the size of a preset single character.

An apparatus for image recognition, comprising:

the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized; the image to be recognized is displayed with a line of text to be recognized;

the first processing unit is used for processing the image to be recognized by utilizing a target detection algorithm model to obtain the position information of each character forming the text to be recognized;

the segmentation unit is used for segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images; wherein each sub-image displays one character;

the second processing unit is used for processing each sub-image by utilizing a character recognition-convolution neural network model to recognize characters in each sub-image;

and the arranging unit is used for arranging each character obtained by identification according to the position information of the character to obtain the identification result of the image to be identified.

Optionally, the image recognition apparatus further includes:

the first judging unit is used for judging whether the text to be recognized displayed by the image to be recognized is a plurality of lines;

and the cutting unit is used for finding the upper limit and the lower limit of each line and horizontally cutting to obtain a plurality of sub texts to be identified if the first judging unit judges that the texts to be identified displayed by the images to be identified are in a plurality of lines.

Optionally, the first processing unit includes:

the second judging unit is used for judging whether the size of the text to be recognized accords with the preset size or not;

a changing unit, configured to change the size of the text to be recognized to the preset size if the second determining unit determines that the size of the text to be recognized does not conform to the preset size;

a recording unit, configured to record a position (xmin, ymin, xmax, ymax) of each Chinese character in the text to be recognized after the text to be recognized is changed to the preset size; wherein, (xmin, ymin) and (xmax, ymax) are the coordinates of the upper left corner and the lower right corner of the Chinese character, respectively.

Optionally, the image recognition apparatus further includes:

the identification unit is used for identifying the size of each Chinese character in the text to be identified by using a preset anchor and confirming the size of each Chinese character in the text to be identified; wherein the preset anchors are of sizes (10,10), (20,20), (30,30), (40,40), (50,50) and (60, 60).

Optionally, the image recognition apparatus further includes:

and the adjusting unit is used for adjusting the size of each sub-image according to the size of a preset single character.

According to the scheme, in the image identification method and the image identification device, the image to be identified is acquired; the image to be recognized is displayed with a line of text to be recognized; processing the image to be recognized by using a target detection algorithm model to obtain the position information of each character forming the text to be recognized; segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images; wherein each sub-image displays one character; processing each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image; and arranging each character obtained by identification according to the position information of the character to obtain the identification result of the image to be identified. The method achieves the purpose of improving the accuracy of character segmentation in the graph, thereby improving the accuracy of character recognition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a detailed flowchart of a method for image recognition according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image recognition method according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of a horizontal projection method according to another embodiment of the present invention;

fig. 4 is a schematic diagram of an image recognition apparatus according to another embodiment of the present invention;

fig. 5 is a schematic diagram of an image recognition apparatus according to another embodiment of the present invention;

fig. 6 is a schematic diagram of an image recognition apparatus according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present application provides an image recognition method, as shown in fig. 1, the method includes the following steps:

and S101, acquiring an image to be identified.

And displaying a line of text to be recognized on the image to be recognized.

It should be noted that the text to be recognized may include numbers, english, chinese characters, and the like. Typically identification card identification, license plate number identification, courier identification, bank card number identification, and the like.

S102, processing the image to be recognized by using the target detection algorithm model to obtain the position information of each character forming the text to be recognized.

The target detection algorithm model can be a currently common yoloV3 model, and is a neural network model.

It should be noted that, the neural network model is usually not optimal at the beginning of the construction, and the obtained prediction result, i.e. the calculation result, is not optimal. The neural network model needs to be trained through a large number of training samples. In the training process of the neural network model, a training sample with a known actual result is input into the convolutional neural network for training to obtain a prediction result of image denoising by a user in the test sample. And then continuously optimizing parameters in the neural network model according to the prediction result and the actual result, so that the prediction result output by the neural network model is consistent with the actual result as far as possible. The optimized neural network model can be directly used for processing the image to be recognized to obtain the position information of each character forming the text to be recognized.

Optionally, in another embodiment of the present invention, before step S102, the method further includes:

and judging whether the text to be recognized displayed by the image to be recognized is a plurality of lines.

It should be noted that yoloV3 can recognize only one line of words, and therefore, when there are multiple lines of words in the text to be recognized, it is necessary to divide the multiple lines of words in the text to be recognized into multiple lines of words.

It should be noted that, in the process of dividing a plurality of lines of text into a plurality of lines of text, a horizontal projection method is generally adopted, wherein, as shown in fig. 3, the horizontal projection method is to sum up and count pixels of each line of a text image, and then draw a statistical result graph according to the statistical result, so as to determine a starting point and an ending point of each line.

Specifically, if the text to be recognized displayed by the image to be recognized is judged to be multiple lines, the upper limit and the lower limit of each line are found, and horizontal cutting is performed to obtain multiple sub-texts to be recognized.

Optionally, in another embodiment of the present invention, an implementation manner of step S102, as shown in fig. 2, includes:

s201, judging whether the size of the text to be recognized accords with a preset size.

Because the yoloV3 model requires that the width and height of an input image can not be lower than 320, in the construction process of the yoloV3 model, 3500 commonly used Chinese characters are utilized to generate a current Chinese character image sample, usually, a single-line character image sample is set to have the width of 320 and the height of 320, and the number of single-line characters is set to be 20 at most; in the training process of yoloV3 model, the training set is generally 30000 sheets and the test set is generally 10000 sheets, but the number of the training set and the test set is not limited herein.

Specifically, if the step S201 determines that the size of the text to be recognized matches the preset size, the step S203 is directly executed; if the step S201 determines that the size of the text to be recognized does not conform to the preset size, the step S202 is executed and then the step S203 is executed.

S202, changing the size of the text to be recognized into a preset size.

Specifically, if the size of the text to be recognized is judged to be smaller than the preset size, a blank part is added below the text to be recognized, and then the size of the image sample is adjusted to the preset size; and if the size of the text to be recognized is judged to be larger than the preset size, segmenting the part which does not comprise the characters.

And S203, recording the position (xmin, ymin, xmax, ymax) of each Chinese character in the text to be recognized after the text to be recognized is changed to the preset size.

Wherein, (xmin, ymin) and (xmax, ymax) are the coordinates of the upper left corner and the lower right corner of the Chinese character, respectively.

Optionally, in another embodiment of the present invention, after step S203, the method further includes:

and identifying the size of each Chinese character in the text to be identified by using a preset anchor, and confirming the size of each Chinese character in the text to be identified.

Wherein the preset anchor may have a size of (10,10), (20,20), (30,30), (40,40), (50,50) and (60, 60).

It should be noted that the size of the anchor is a common font size, and in the process of practical application, the length-width ratio of the chinese character may be changed according to practical situations, such as (15,15), (28,28), and the like, and the anchor may also be set to have different lengths and widths, such as (15,20), (18,14), and the like.

It should be further noted that when the size of each Chinese character in the text to be recognized is determined by using the preset anchor, the number of the preset anchor detection object categories needs to be set to 1, so that it is avoided that two Chinese characters are obtained from one anchor, which results in inaccurate subsequent segmentation operation. For example, if the size of a chinese character in the text to be recognized is (15,15), and the chinese character in the text to be recognized is "wood", "sheep", or "denier", if the number of the detected object categories of the anchor of (60,60) is not set, the anchor of (60,60) is very easy to acquire a plurality of chinese characters, and if two chinese characters of "wood" and "sheep" are acquired at the same time, it is very likely that "wood" and "sheep" will be determined as one chinese character, i.e., "sample", in the process of segmenting the characters in the subsequent step; similarly, two Chinese characters of 'mu' and 'dan' may be obtained simultaneously, and in the process of segmenting the characters in the subsequent steps, the characters are likely to be judged as one Chinese character of 'mu' and 'dan', namely 'searching', which seriously affects the accuracy rate of identifying the characters subsequently.

S103, segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images.

Wherein each sub-image displays a text.

It should be noted that, when the image to be recognized is segmented according to the position information of each character, it is necessary to record and ensure that a plurality of sub-images obtained after segmentation can be recombined back to the image to be recognized.

And S104, processing each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image.

The character recognition-convolution neural network model is obtained by utilizing the training and construction of a convolution neural network.

In a specific implementation process of this embodiment, a training process of the character recognition-convolutional neural network model may be to establish an initial neural network according to preset initial parameters, and determine the initial neural network as a current neural network; recognizing characters in the sample image by using a current neural network; the sample image is each image in a preset sample image set; comparing the characters identified by the current neural network with the characters of the pre-labeled sample image to obtain a comparison result; judging whether the identification precision of the current neural network meets the precision requirement or not according to the comparison result; if the identification precision of the current neural network does not meet the precision requirement, updating the parameters in the current neural network to obtain an updated neural network; taking the updated neural network as a current neural network, and returning to execute the recognition of characters in the sample image by using the current neural network; and if the identification precision of the current neural network meets the precision requirement, determining the current neural network as a character identification-convolution neural network model.

Optionally, in another embodiment of the present invention, before step S104, the method further includes:

and adjusting the size of each sub-image according to the preset size of the single character.

It should be noted that, the word recognition-convolution neural network model generally adopts resnet50, and resnet50 can only recognize 32 × 32 size images, so that the size of the sub-images needs to be adjusted before inputting into the word recognition-convolution neural network model, so that the sub-images can be recognized by the word recognition-convolution neural network model.

And S105, arranging each character obtained by recognition according to the position information of the character to obtain the recognition result of the image to be recognized.

According to the scheme, in the image identification method provided by the application, the image to be identified is acquired; the image to be recognized is displayed with a line of text to be recognized; processing the image to be recognized by using a target detection algorithm model to obtain the position information of each character forming the text to be recognized; segmenting the image to be recognized according to the position information of each character of the text to be recognized to obtain a plurality of sub-images; wherein each sub-image displays one character; processing each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image; and arranging each character obtained by identification according to the position information of the character to obtain the identification result of the image to be identified. The method achieves the purpose of improving the accuracy of character segmentation in the graph, thereby improving the accuracy of character recognition.

An embodiment of the present application provides an image recognition apparatus, as shown in fig. 4, including:

an obtaining unit 401 is configured to obtain an image to be recognized.

And displaying a line of text to be recognized on the image to be recognized.

The first processing unit 402 is configured to process the image to be recognized by using the target detection algorithm model, so as to obtain position information of each character forming the text to be recognized.

Optionally, in another embodiment of the present invention, an implementation manner of the first processing unit 402, as shown in fig. 5, includes:

the second determining unit 501 is configured to determine whether the size of the text to be recognized matches a preset size.

A changing unit 502, configured to change the size of the text to be recognized to a preset size if the second determining unit 501 determines that the size of the text to be recognized does not conform to the preset size.

The recording unit 503 is configured to record a position (xmin, ymin, xmax, ymax) of each chinese character in the text to be recognized after the text to be recognized is changed to a preset size.

For the specific working process of the unit disclosed in the above embodiment of the present invention, reference may be made to the content of the corresponding method embodiment, as shown in fig. 2, which is not described herein again.

The segmenting unit 403 is configured to segment the image to be recognized according to the position information of each character of the text to be recognized, so as to obtain a plurality of sub-images.

Wherein each sub-image displays a text.

And the second processing unit 404 is configured to process each sub-image by using the character recognition-convolutional neural network model, so as to recognize characters in each sub-image.

The arranging unit 405 is configured to arrange each character obtained through recognition according to the position information of the character, so as to obtain a recognition result of the image to be recognized.

For the specific working process of the unit disclosed in the above embodiment of the present invention, reference may be made to the content of the corresponding method embodiment, as shown in fig. 1, which is not described herein again.

Optionally, in another embodiment of the present invention, as shown in fig. 6, the image recognition apparatus further includes:

a first judging unit 601, configured to judge whether the text to be recognized displayed by the image to be recognized is multiple lines.

The cutting unit 602 is configured to find an upper limit and a lower limit of each line if the first determining unit 601 determines that the text to be recognized displayed in the image to be recognized is multiple lines, and perform horizontal cutting to obtain multiple sub-texts to be recognized.

For the specific working process of the unit disclosed in the above embodiment of the present invention, reference may be made to the content of the corresponding method embodiment, which is not described herein again.

and the identification unit is used for identifying the size of each Chinese character in the text to be identified by using a preset anchor and confirming the size of each Chinese character in the text to be identified.

Wherein the preset anchors are (10,10), (20,20), (30,30), (40,40), (50,50) and (60,60) in size.

According to the scheme, in the image recognition device provided by the application, the image to be recognized is acquired through the acquisition unit 401; the image to be recognized is displayed with a line of text to be recognized; the first processing unit 402 processes the image to be recognized by using a target detection algorithm model to obtain position information of each character forming the text to be recognized; segmenting the image to be identified by utilizing a segmentation unit 403 according to the position information of each character of the text to be identified to obtain a plurality of sub-images; wherein each sub-image displays one character; the second processing unit 404 processes each sub-image by using a character recognition-convolution neural network model to recognize characters in each sub-image; finally, arranging each character obtained by identification according to the position information of the character through an arranging unit 405 to obtain the identification result of the image to be identified. The method achieves the purpose of improving the accuracy of character segmentation in the graph, thereby improving the accuracy of character recognition.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of image recognition, comprising:

2. The method according to claim 1, wherein before the processing the image to be recognized by using the target detection algorithm model to obtain the position information of each character constituting the text to be recognized, the method further comprises:

3. The method according to claim 1, wherein the processing the image to be recognized by using the target detection algorithm model to obtain the position information of each character constituting the text to be recognized comprises:

4. The method according to claim 3, wherein the recording after the position (xmin, ymin, xmax, ymax) of each Chinese character in the text to be recognized after the text to be recognized is changed to the preset size further comprises:

5. The method of claim 1, wherein before processing each of the sub-images using a text recognition-convolutional neural network model to identify text in each of the sub-images, further comprising:

6. An apparatus for image recognition, comprising:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, wherein the first processing unit comprises:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 6, further comprising: