CN111079749A

CN111079749A - End-to-end commodity price tag character recognition method and system with attitude correction function

Info

Publication number: CN111079749A
Application number: CN201911273581.5A
Authority: CN
Inventors: 秦永强; 张发恩; 高达辉
Original assignee: Ainnovation Chongqing Technology Co ltd
Current assignee: Ainnovation Chongqing Technology Co ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-28
Anticipated expiration: 2039-12-12
Also published as: CN111079749B

Abstract

The invention provides an end-to-end commodity price tag character recognition method with posture correction and a system thereof, belonging to the technical field of computer vision and comprising the following steps: acquiring a commodity price tag image and extracting characteristics to obtain a corresponding characteristic diagram; carrying out region selection processing on the feature map to obtain a character suggestion region; segmenting the character suggestion area to obtain a processed character suggestion area, and performing graphic expansion processing on the processed character suggestion area to obtain a character feature map; carrying out key point detection processing on the character feature diagram to obtain a plurality of key points surrounding the character feature diagram; carrying out posture correction processing on the character characteristic graph by utilizing thin plate spline interpolation according to a plurality of key points to obtain a characteristic graph to be processed with fixed size and level; and performing the word processing on the characteristic diagram to be processed to obtain corresponding words. The invention has the beneficial effects that: the robustness and efficiency of complex scene character recognition can be improved.

Description

End-to-end commodity price tag character recognition method and system with attitude correction function

Technical Field

The invention relates to the field of computer vision, in particular to an end-to-end commodity price tag character recognition method with posture correction and a system thereof.

Background

The commodity price label in the channel display image is identified through a computer vision technology, so that the knowledge of commodity price information becomes an important solution for each fast selling brand merchant to control the price of the distribution terminal. In the scheme, the accurate identification of the characters on the price tag is the key to realize the quick and accurate identification of the commodity price.

Due to the image shooting angle, the commodity price tag in the image has the characteristic of any posture, and the direction and the posture of characters on the price tag are uncertain, so that great difficulty is brought to accurate recognition of the characters. In addition, commodity price identification based on computer vision technology generally has high effectiveness requirement, and the identification speed which can be close to real time is required. However, the number of price tags in a single channel display image is typically high (typically up to tens), and the text fields on a single price tag are typically up to tens, which presents a significant challenge to speed of identification.

Most of the existing character recognition schemes adopt an algorithm scheme of character detection, posture correction and character recognition, firstly, the position of a character is positioned by utilizing a character detection algorithm, then, a character image area is cut, the posture correction (affine transformation, perspective transformation and the like) is carried out on a character image by an image processing technology, and then, the character recognition algorithm is used for recognition. The method gradually realizes character recognition through a plurality of stages, and has two main defects:

1) low recognition efficiency

Both the text detection stage and the text recognition stage perform feature extraction on the same image region, resulting in repeated calculation. The calculation amount of the feature extraction stage usually accounts for most of the total calculation amount, so that the commodity price identification time of a single channel display image is very long, the identification time can reach dozens of seconds to minutes usually, and the real-time requirement is difficult to meet.

2) The algorithm is not robust enough

Character recognition is typically performed after the pose correction. The existing posture correction algorithm is basically carried out after a strict region of a character (such as an arbitrary quadrilateral or rotating rectangular frame region) is determined, all regions (including interference information) of an input character image participate in character recognition after posture correction, and the problems of character information loss (few frame part character regions) and interference information increase (many frame part character regions) caused by inaccurate character regions cannot be corrected, namely, the positioning accuracy of a character frame is sensitive and the robustness is insufficient.

In order to improve the robustness of a character recognition algorithm to the posture, the prior art provides a character recognition algorithm with posture correction, a space conversion module is added in an algorithm model, an effective character area in an input image is selected to perform posture correction based on a plurality of key points predicted by the model, so that character recognition of different postures is realized, the method is insensitive to redundant interference information of the input character image, and a better effect is achieved. However, the cut text segment image is still required to be used as input, text features can be repeatedly extracted, and end-to-end training cannot be realized together with text detection.

In the aspect of end-to-end character recognition, a great deal of work is also carried out in a large number of documents, most of the work still adopts a scheme of multi-stage combined training, an end-to-end character recognition algorithm further proposed in the prior art directly cuts out an interested character area on a feature map for character recognition, repeated feature extraction is avoided, meanwhile, multi-task training can be used for mutual promotion, but character posture correction is not considered. In the prior art, the pose correction is further performed by performing radiation transformation correction on the cut-out interested character feature region, which cannot correct more complicated poses such as perspective state and the like, and cannot solve the problem of character region information loss (less framing of part of effective character regions).

Disclosure of Invention

The invention aims to provide an end-to-end commodity price tag character recognition method with posture correction, which is applied to channel display, scene character recognition and similar scenes and can improve the robustness and efficiency of complex scene character recognition.

In order to achieve the purpose, the invention adopts the following technical scheme:

an algorithm model training method is provided, which comprises the following steps:

the end-to-end commodity price tag character recognition method with posture correction comprises the following steps:

s1, acquiring a commodity price tag image and performing feature extraction to obtain a corresponding feature map;

step S2, carrying out area selection processing on the feature map to obtain a character suggestion area;

step S3, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;

step S4, carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;

step S5, according to the key points and by means of thin plate spline interpolation, carrying out posture correction processing on the character feature graph to obtain a feature graph to be processed with fixed size and level;

and step S6, performing the word processing on the characteristic diagram to be processed to obtain corresponding words.

As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features to obtain the feature map with multiple dimensions.

In step S2, the RPN network is used to perform the region selection process on the feature map to obtain the suggested text region and the position of the circumscribed rectangle thereof.

As a preferable scheme of the end-to-end product price tag character recognition method with posture correction, in step S3, the specific steps of the segmentation process include:

step S31, carrying out de-duplication processing and up-sampling processing on the character suggestion area to obtain at least one high-resolution area, wherein the resolution of the high-resolution area is higher than that of the character suggestion area;

step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character or not and indicating the probability value of the character;

step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all pixel points with characters as attributes in the segmentation probability images, and respectively determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold:

if the judgment result is yes, the segmentation probability image is reserved;

and if the judgment result is negative, deleting the segmentation probability image.

As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S3, the specific steps of the graph expansion process include:

and step S34, according to the length and width of the segmentation probability image, performing external expansion on the segmentation probability image according to a preset proportion to obtain the segmentation probability image subjected to external expansion and a peripheral partial image surrounding the segmentation probability image subjected to external expansion as the character feature map.

As a preferable configuration of the method for recognizing price tag characters of end-to-end commodities with posture correction, in step S4, the key point detection process is performed on the character feature map by using key point detection with attention mechanism, so as to obtain a plurality of key points surrounding the character feature map of interest.

As a preferable scheme of the method for recognizing price tags of end-to-end commodities with posture correction, in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature region actually required to be used in the character feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.

As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S6, the concrete steps of the writing process include:

step S61, carrying out code conversion processing on the characteristic diagram to be processed to obtain a characteristic sequence with fixed length;

step S62, calculating the output characteristics of the characteristic sequence with fixed length by using an attention mechanism and BLSTM;

and step S63, decoding the output characteristics to obtain understandable characters.

The invention also provides an end-to-end commodity price tag character recognition system with posture correction, which can realize the end-to-end commodity price tag character recognition method and comprises the following steps:

the characteristic extraction module is used for acquiring the commodity price tag image and extracting the characteristics to obtain a corresponding characteristic diagram;

the character area cutting module is used for carrying out area selection processing on the feature map to obtain a character suggestion area, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;

the key point detection module is used for carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;

the gesture correction module is used for carrying out gesture correction processing on the character feature map according to the plurality of key points by utilizing thin plate spline interpolation to obtain a feature map to be processed;

and the writing module is used for performing writing processing on the characteristic diagram to be processed to obtain corresponding characters.

As a preferred scheme of the end-to-end commodity price tag character recognition system with posture correction, the system carries out commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to the recognition process and the recognition result.

The invention has the beneficial effects that: after the characteristic diagram is extracted from the commodity price tag image, the characteristic diagram is directly processed to obtain a processed character suggestion area for subsequent character processing, only one-time characteristic extraction is needed, and the character recognition efficiency is effectively improved;

after the character suggestion area is obtained, character segmentation processing is carried out to obtain the processed character suggestion area containing the effective text field, and graph expansion processing is carried out to obtain a character feature graph, so that the problem that the recognition result is influenced due to the loss of partial characters of the character is solved, and the robustness and the efficiency of character recognition in a complex scene are improved;

the method comprises the steps of detecting key points of a character preference total energy graph to obtain a plurality of key points surrounding a character characteristic graph, adjusting the character posture corresponding to the character characteristic graph to the horizontal direction by utilizing thin plate spline interpolation based on the key points to obtain a fixed-size and horizontal characteristic graph to be processed, recognizing characters in different directions and in curve shapes, and improving robustness and efficiency of character recognition in a complex scene.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a flowchart of an end-to-end commodity price tag character recognition method with posture correction according to an embodiment of the present invention.

Fig. 2 is a flowchart of step S3 according to another embodiment of the present invention;

FIG. 3 is a flowchart of step S6 according to an embodiment of the present invention;

fig. 4 is a functional block diagram of an end-to-end product price tag character recognition system with posture correction according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

As shown in fig. 1, an end-to-end commodity price tag character recognition method with posture correction provided in an embodiment of the present invention includes:

step S2, carrying out area selection processing on the characteristic diagram to obtain a character suggestion area;

step S3, the character suggestion area is segmented to obtain a processed character suggestion area, and the processed character suggestion area is subjected to graphic expansion to obtain a character feature map;

step S5, according to the key points and by using thin plate spline interpolation, carrying out posture correction processing on the character characteristic diagram to obtain a characteristic diagram to be processed with fixed size and level;

In the embodiment, after the feature map is extracted from the commodity price tag image, the feature map is directly processed to obtain a processed character suggestion region for subsequent character processing, and only one-time feature extraction is needed, so that the character recognition efficiency is effectively improved;

the method comprises the steps of detecting key points of a character feature diagram to obtain a plurality of key points surrounding the character feature diagram, adjusting the character posture corresponding to the character feature diagram to the horizontal direction by utilizing thin plate spline interpolation based on the key points to obtain a horizontal feature diagram to be processed with a fixed size, recognizing characters in different directions and in curve shapes, and improving robustness and efficiency of character recognition in a complex scene.

Further, in step S1, feature extraction is performed on the product price label image using a deep learning network to extract character features to obtain the feature map in multiple dimensions.

Further, in the step S2, the RPN network is used to perform the region selection process on the feature map to obtain the character suggestion region and the position of the circumscribed rectangle thereof.

Specifically, a regression branch is used to obtain the position of the circumscribed rectangle frame of the character suggestion region,

as shown in fig. 2, further, in step S3, the dividing process includes:

step S32, performing pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, where the attribute probability information is used to indicate whether the pixel point is a text or not and a probability value of the text;

step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and respectively determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold:

if the judgment result is yes, the segmentation probability image is reserved;

Specifically, another segmentation branch is utilized to obtain a segmentation map and a probability map corresponding to the segmentation map, wherein each pixel point is a character or not (the segmentation map and the probability map are collectively referred to as a segmentation probability image);

and then, calculating the average score of each character suggestion region according to the probability value score of the pixel points belonging to the characters in each character suggestion region, and reserving the character suggestion regions with the scores higher than a certain threshold value.

Further, in step S3, the specific steps of the pattern expansion process include:

as shown in fig. 2, in step S34, the segmentation probability image is expanded according to the length and width of the segmentation probability image and a predetermined ratio, and the segmentation probability image after the expansion and the peripheral partial image surrounding the segmentation probability image after the expansion are obtained as the character feature map.

Specifically, according to the length and width of the character suggestion region, a certain proportion of expansion is performed, and then the expanded character suggestion region (i.e. the character feature map) is cut and input to the next stage.

Further, in step S4, the key point detection process is performed on the character feature map by using a key point detection with attention mechanism to obtain a plurality of key points surrounding the character feature map of interest.

Specifically, according to the cut character suggestion region features (namely character feature maps), a key point detection network with an attention mechanism is utilized to detect k key points surrounding the concerned character feature maps.

Further, in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature region actually required to be used in the text feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.

Specifically, according to k key points, an interested feature map region (namely a character feature map) is converted into a horizontal feature region with a fixed size by utilizing thin plate spline interpolation;

as shown in fig. 3, in step S6, the step of converting into text specifically includes:

Specifically, an encoder + LSTM + authentication is then used to identify the corresponding text.

As shown in fig. 4, an end-to-end article price tag text recognition system with posture correction, comprising:

the characteristic extraction module 1 is used for acquiring a commodity price tag image and extracting characteristics to obtain a corresponding characteristic diagram, mainly extracting character characteristics by utilizing a convolution neural network based on the input commodity price tag image, and outputting a multi-dimensional characteristic diagram;

the character area cutting module 2 is used for carrying out area selection processing on the characteristic graph to obtain a character suggestion area, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graph expansion processing on the processed character suggestion area to obtain a character characteristic graph;

a key point detection module 3, configured to perform key point detection processing on the text feature map to obtain a plurality of key points surrounding the text feature map;

the posture correction module 4 is used for carrying out posture correction processing on the character characteristic diagram according to the key points and by utilizing thin plate spline interpolation to obtain a characteristic diagram to be processed;

and the writing module 5 is used for performing writing processing on the characteristic diagram to be processed to obtain corresponding characters.

Further, the text region cutting module 2 includes:

the character region suggesting unit 21 obtains the position of a circumscribed rectangular frame of the character suggesting region by using an RPN according to the extracted feature map;

an nms unit 22 configured to perform deduplication processing on the obtained text suggestion region;

an upsampling unit 23, which is mainly used to transform the low-resolution features to the high-resolution features so as to perform segmentation processing on the text regions;

the segmentation unit 24 is used for performing pixel-by-pixel segmentation according to the feature map obtained by the dish sample loading unit and determining whether each pixel belongs to a character region and the probability of each pixel;

a score calculation unit 25 that calculates, for each character suggestion region, an average probability of all pixel points belonging to characters contained therein as a score of the character suggestion region;

a text region cutting unit 26, which performs outward expansion according to the length and width of each text suggestion region with score higher than a certain threshold obtained in the foregoing process according to a certain proportion, and cuts a feature map including the text suggestion region and a peripheral region thereof as a text feature map input to the next stage; wherein the expansion scale factor is inversely proportional to the size of the text suggestion region.

Further, the key point detection module 3 detects peripheral key points of a focused text region in the input text feature map to constrain the feature region that actually needs to be used, mainly to filter out irrelevant interference feature information. Because, the input text feature map may contain partial feature information of other text fields around the text segment of interest. The key point detecting module 3 includes:

a first attention unit 31 for calculating an attention parameter for controlling an area of interest when predicting the keypoint;

a key point detection unit 32 for converting the input feature map to an output feature map of a fixed size by using thin-plate spline interpolation according to the obtained key points;

further, the writing module 5 includes:

an encoding unit 51 for encoding and converting the fixed-size feature map into a fixed-size feature sequence;

a second attention unit 52 and a BLSTM unit 53, which are used for calculating to obtain output characteristics;

the decoding unit 54 transcribes the output features into intelligible text.

Furthermore, the system identifies the commodity price tag characters based on a preset processing model, and updates and optimizes the processing model according to the identification process and the identification result. In the model training process, the character rectangular box detection, the character segmentation detection and the character recognition all participate in loss calculation, and the performance is improved through multi-task training.

The character detection and character recognition multiplexing feature extractor can effectively improve the recognition efficiency;

the problem that the recognition result is influenced due to the fact that the characteristics of a character part are lost can be solved by utilizing a character area cutting module with a self-adaptive expansion function;

the influence of redundant character areas in the cut character characteristic areas of interest can be relieved by utilizing a character key point detection module with an attention mechanism;

based on the detected key points of the characters, the gesture of the characters is corrected to the horizontal direction by utilizing thin plate spline interpolation, and the recognition effect is improved.

It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims

1. An end-to-end commodity price tag character recognition method with posture correction is characterized by comprising the following steps:

step S5, according to the plurality of key points and by means of thin plate spline interpolation, carrying out posture correction processing on the character feature graph to obtain a feature graph to be processed with fixed size and level;

2. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features to obtain the feature map with multiple dimensions.

3. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S2, the RPN network is used to perform the region selection process on the feature map to obtain the character suggestion region and the position of the rectangle circumscribing the character suggestion region.

4. The method for recognizing commodity price tags with pose correction according to claim 1, wherein said step S3, the specific steps of said segmentation process include:

if the judgment result is yes, the segmentation probability image is reserved;

5. The method for end-to-end commodity price tag character recognition with posture correction as claimed in claim 4, wherein in said step S3, the specific steps of said graphic externally expanding process include:

6. The method for recognizing commodity price tags with pose correction according to claim 1, wherein in step S4, said key point detection processing is performed on said character feature map by using key point detection with attention mechanism to obtain a plurality of key points surrounding said character feature map of interest.

7. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S5, according to the plurality of key points and by using thin-plate spline interpolation, the feature region actually required to be used in the character feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.

8. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S6, the concrete steps of said writing process include:

9. An end-to-end commodity price tag character recognition system with posture correction, which can realize the end-to-end commodity price tag character recognition method as any one of claims 1 to 8, is characterized by comprising the following steps:

10. The system for end-to-end commodity price tag character recognition with posture correction according to claim 9, wherein said system performs commodity price tag character recognition based on a preset processing model, and updates and optimizes said processing model according to the recognition process and the recognition result.