CN111079749B - End-to-end commodity price tag character recognition method and system with gesture correction - Google Patents
End-to-end commodity price tag character recognition method and system with gesture correction Download PDFInfo
- Publication number
- CN111079749B CN111079749B CN201911273581.5A CN201911273581A CN111079749B CN 111079749 B CN111079749 B CN 111079749B CN 201911273581 A CN201911273581 A CN 201911273581A CN 111079749 B CN111079749 B CN 111079749B
- Authority
- CN
- China
- Prior art keywords
- feature map
- character
- text
- processing
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012937 correction Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 69
- 238000001514 detection method Methods 0.000 claims abstract description 30
- 230000011218 segmentation Effects 0.000 claims description 51
- 230000008569 process Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000002093 peripheral effect Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000036544 posture Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention provides an end-to-end commodity price tag character recognition method and system with gesture correction, which belong to the technical field of computer vision and comprise the following steps: acquiring a commodity price tag image and extracting features to obtain a corresponding feature map; performing region selection processing on the feature map to obtain a text suggestion region; dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map; performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map; performing attitude correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed; and carrying out literal processing on the feature map to be processed to obtain corresponding words. The invention has the beneficial effects that: the robustness and the efficiency of the character recognition of the complex scene can be improved.
Description
Technical Field
The invention relates to the field of computer vision, in particular to an end-to-end commodity price tag character recognition method and system with gesture correction.
Background
The commodity price labels in the channel display images are identified through computer vision technology, so that knowledge of commodity price information has become an important solution for managing and controlling the terminal prices of distribution terminals of various fast-selling brand merchants. In the scheme, the quick and accurate identification of commodity price is realized, and the accurate identification of characters on the price tag is key.
Due to the reasons of image shooting angles, commodity price tags in images have the characteristic of random postures, the directions and the postures of characters on the price tags are uncertain, and great difficulty is brought to accurate identification of the characters. In addition, commodity price identification based on computer vision technology generally has high effectiveness requirements, and near real-time identification speed is required. However, the number of tags in a single channel display image is typically high (typically up to tens), and the number of text fields on a single tag is also typically up to tens, which presents a significant challenge for recognition speed.
Most of the existing character recognition schemes adopt an algorithm scheme of character detection, gesture correction and character recognition, firstly, the character detection algorithm is utilized to locate the position of characters, then, a character image area is cut, gesture correction (affine transformation, perspective transformation and the like) is carried out on the character image through an image processing technology, and then, the character recognition algorithm is utilized to carry out recognition. The method realizes character recognition step by step through a plurality of stages, and mainly has two defects:
1) Inefficiency of recognition
The character detection stage and the character recognition stage can perform feature extraction on the same image area, so that repeated calculation is caused. The calculation amount of the feature extraction stage often occupies most of the total calculation amount, which results in particularly long commodity price identification time of a single channel display image, usually reaching identification time of tens of seconds to minutes, and being difficult to meet the real-time requirement.
2) Algorithm robustness is not enough
Text recognition is typically performed after gesture correction. The existing gesture correction algorithm is basically carried out after a strict area (such as any quadrilateral or rotary rectangular frame area) of a character is determined, all areas (including interference information) of an input character image participate in character recognition after gesture correction, and the problems of character information loss (less-framed part of character areas) and interference information increase (more-framed part of character areas) caused by inaccurate character areas cannot be corrected, namely, the positioning accuracy of the character frame is relatively sensitive, and the robustness is insufficient.
In order to improve the robustness of a character recognition algorithm to the gesture, the prior art provides a character recognition algorithm with gesture correction, and a space conversion module is added in an algorithm model, so that the character recognition with different gestures is realized by selecting an effective character area in an input image to carry out gesture correction based on a plurality of key points predicted by the model, and the character recognition algorithm is insensitive to redundant interference information of the input character image, so that a better effect is obtained. However, the cut text segment image is still required to be used as input, text features can be repeatedly extracted, and end-to-end training can not be realized together with text detection.
In the aspect of end-to-end character recognition, a great deal of work is carried out in a large number of documents, wherein most of the work still adopts a scheme of multi-stage combined training, the character recognition algorithm further proposed in the prior art directly cuts out the character region of interest on the feature map to carry out character recognition, repeated extraction of features is avoided, and meanwhile, multi-task training can be utilized to promote each other, but character gesture correction is not considered. In the prior art, the attitude correction is further performed by performing radial transformation correction on the segmented character feature region of interest, which cannot correct more complicated attitudes such as perspective states, and the problem of losing information of the character region (less-framed part of the effective character region) cannot be solved.
Disclosure of Invention
The invention aims to provide an end-to-end commodity price tag character recognition method with gesture correction, which is applied to channel display, scene character recognition and similar scenes and can improve the robustness and efficiency of complex scene character recognition.
To achieve the purpose, the invention adopts the following technical scheme:
provided is an algorithm model training method, comprising:
the end-to-end commodity price tag character recognition method with gesture correction comprises the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
and S6, performing literal processing on the feature map to be processed to obtain corresponding words.
In the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features and obtain the multidimensional feature map.
In the step S2, the character suggestion area and the position of the circumscribed rectangle frame are obtained by carrying out the area selection processing on the feature map by utilizing an RPN network.
As a preferred solution of the end-to-end commodity price tag text recognition method with gesture correction, in the step S3, the specific steps of the segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and judging whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, reserving the segmentation probability image;
and if the judgment result is negative, deleting the segmentation probability image.
As a preferred scheme of the character recognition method for the end-to-end commodity price tag with gesture correction, in the step S3, the specific steps of the graphic expansion processing include:
and step S34, performing outward expansion on the segmentation probability image according to the length and width dimensions of the segmentation probability image and a preset proportion to obtain the segmentation probability image after outward expansion and a peripheral part image surrounding the segmentation probability image after outward expansion as the character feature map.
As a preferred embodiment of the method for recognizing the characters of the end-to-end commodity price tag with gesture correction, in the step S4, the key point detection process is performed on the character feature map by using the key point detection with the attention mechanism to obtain a plurality of key points surrounding the concerned character feature map.
In the step S5, according to the plurality of key points and by using thin plate spline interpolation, a feature area actually required to be used in the character feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by a attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
As a preferred embodiment of the method for recognizing characters of end-to-end commodity price tags with gesture correction, in the step S6, the specific steps of the word processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
The invention also provides an end-to-end commodity price tag character recognition system with gesture correction, which can realize the end-to-end commodity price tag character recognition method, and comprises the following steps:
the feature extraction module is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images;
the character region cutting module is used for carrying out region selection processing on the feature map to obtain a character suggestion region, carrying out segmentation processing on the character suggestion region to obtain a processed character suggestion region, and carrying out graphic expansion processing on the processed character suggestion region to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module is used for literaling the feature map to be processed to obtain corresponding words.
As a preferable scheme of the end-to-end commodity price tag character recognition system with gesture correction, the system performs commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to a recognition process and a recognition result.
The invention has the beneficial effects that: after extracting the feature map from the commodity price tag image, directly processing the feature map to obtain a processed text suggestion region for subsequent text processing, and only carrying out feature extraction once, thereby effectively improving the text recognition efficiency;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain a processed character suggestion area containing effective character fields, and graphic expansion processing is carried out to obtain a character feature map, so that the problem that recognition results are affected due to the fact that part of character features are lost is solved, and the robustness and the efficiency of character recognition of complex scenes are improved;
and performing key point detection on the character preferential total energy diagram to obtain a plurality of key points surrounding the character feature diagram, adjusting the character gesture corresponding to the character feature diagram to the horizontal direction based on the key points by utilizing thin plate spline interpolation to obtain a feature diagram to be processed with a fixed size and horizontally, identifying characters in different directions and curves, and improving the robustness and the efficiency of character identification of a complex scene.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of a method for end-to-end commodity price tag text identification with gesture correction according to an embodiment of the present invention.
FIG. 2 is a flow chart of step S3 according to another embodiment of the present invention;
FIG. 3 is a flowchart of step S6 according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of functional modules of an end-to-end commodity price tag text recognition system with gesture correction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to be limiting of the present patent; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if the terms "upper", "lower", "left", "right", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, only for convenience in describing the present invention and simplifying the description, rather than indicating or implying that the apparatus or elements being referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and should not be construed as limiting the present patent, and that the specific meaning of the terms described above may be understood by those of ordinary skill in the art according to specific circumstances.
In the description of the present invention, unless explicitly stated and limited otherwise, the term "coupled" or the like should be interpreted broadly, as it may be fixedly coupled, detachably coupled, or integrally formed, as indicating the relationship of components; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between the two parts or interaction relationship between the two parts. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
As shown in fig. 1, the method for recognizing the price tag words of the end-to-end commodity with gesture correction provided by the embodiment of the invention comprises the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
and S6, performing literal processing on the feature map to be processed to obtain corresponding words.
In the embodiment, after the feature map is extracted from the commodity price tag image, the feature map is directly processed to obtain the processed text suggestion region for subsequent text processing, and only one feature extraction is needed, so that the text recognition efficiency is effectively improved;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain a processed character suggestion area containing effective character fields, and graphic expansion processing is carried out to obtain a character feature map, so that the problem that recognition results are affected due to the fact that part of character features are lost is solved, and the robustness and the efficiency of character recognition of complex scenes are improved;
and performing key point detection on the character feature map to obtain a plurality of key points surrounding the character feature map, adjusting the character gesture corresponding to the character feature map to the horizontal direction based on the key points by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and in a horizontal state, and can identify characters in different directions and curves, so that the robustness and the efficiency of character identification of complex scenes are improved.
Further, in the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract text features so as to obtain the multi-dimensional feature map.
Further, in the step S2, the region selection process is performed on the feature map by using an RPN network to obtain the text suggestion region and the position of the circumscribed rectangular frame thereof.
Specifically, a regression branch is utilized to obtain the position of the circumscribed rectangle frame of the text suggestion region,
as shown in fig. 2, in the step S3, the specific steps of the segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character or not and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixels with characters as attributes in the segmentation probability images, and determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, the segmentation probability image is reserved;
if the judgment result is negative, deleting the segmentation probability image.
Specifically, whether each pixel point is a segmentation map of a character or not and a probability map corresponding to the segmentation map are obtained by using another segmentation branch (the segmentation map and the probability map are collectively called as a segmentation probability image);
and then calculating the average score of the text suggestion areas according to the probability value scores of the pixels belonging to the text in each text suggestion area, and reserving the text suggestion areas with the scores higher than a certain threshold value.
Further, in the step S3, the specific steps of the pattern expansion process include:
as shown in fig. 2, step S34 is to expand the segmentation probability image according to a preset ratio according to the length and width of the segmentation probability image, so as to obtain the expanded segmentation probability image and a peripheral part image surrounding the expanded segmentation probability image as the text feature map.
Specifically, according to the length and width dimensions of the text suggestion region, a certain proportion of expansion is performed, and then the expanded text suggestion region (i.e. the text feature map) is cut and input to the next stage.
Further, in the step S4, the keyword detection process is performed on the text feature map by using the keyword detection with attention mechanism to obtain the plurality of the keywords surrounding the text feature map of attention.
Specifically, according to the characteristics of the cut text suggestion region (i.e. the text feature map), a key point detection network with a attention mechanism is utilized to detect k key points surrounding the concerned text feature map.
Further, in the step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature area actually required to be used in the text feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by a attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
Specifically, according to k key points, a feature map area (namely a text feature map) of interest is transformed into a horizontal feature area with a fixed size by utilizing thin-plate spline interpolation;
as shown in fig. 3, in the step S6, the specific steps of the word processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
Specifically, an encoder +LSTM +intent is then used to identify the corresponding text.
As shown in fig. 4, an end-to-end commodity price tag text recognition system with gesture correction, comprising:
the feature extraction module 1 is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images, mainly based on the input commodity price tag images, extracting character features by using a convolutional neural network, and outputting a multidimensional feature image;
the text region cutting module 2 is used for carrying out region selection processing on the feature map to obtain a text suggestion region, carrying out segmentation processing on the text suggestion region to obtain a processed text suggestion region, and carrying out graphic expansion processing on the processed text suggestion region to obtain a text feature map;
the key point detection module 3 is used for performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module 4 is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module 5 is used for literaling the feature map to be processed to obtain corresponding words.
Further, the text region cutting module 2 includes:
a text region suggesting unit 21 for obtaining the position of the circumscribed rectangular frame of the text region suggested by the text region using an RPN network based on the extracted feature map;
an nms unit 22 for performing a de-duplication process on the obtained text suggestion region;
an up-sampling unit 23 for transforming the low resolution feature into the high resolution feature so as to divide the text region later;
the segmentation unit 24 performs pixel-by-pixel segmentation according to the feature map obtained by the vegetable sample loading unit, and determines whether each pixel belongs to a text region and the probability thereof;
a score calculating unit 25 that calculates, for each text suggestion region, an average probability of all pixels belonging to the text contained therein as a score of the text suggestion region;
a character region cutting unit 26 for performing outward expansion according to a certain proportion of the length and width of each character suggestion region with the score higher than a certain threshold value obtained in the previous process, and cutting a feature map containing the character suggestion region and the peripheral part region thereof as a character feature map input to the next stage; wherein the expansion scale factor is inversely proportional to the size of the text suggestion region.
Further, the key point detection module 3 detects peripheral key points of the concerned text region in the input text feature map so as to restrict the feature region actually needed to be used, mainly for filtering irrelevant interference feature information. Because the entered text feature map may contain partial feature information for other text fields surrounding the text segment of interest. The key point detection module 3 includes:
a first attention unit 31 that calculates an attention parameter for controlling a region of interest at the time of keypoint prediction;
a key point detection unit 32 for converting the input feature map to an output feature map of a fixed size by thin-plate spline interpolation based on the obtained key points;
further, the literal module 5 includes:
a coding unit 51 for coding and converting the feature map with fixed size into a feature sequence with fixed length;
a second attention unit 52 and a BLSTM unit 53, with which output features are calculated;
the decoding unit 54 transcribes the output features into intelligible text.
Further, the system carries out commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to the recognition process and the recognition result. In the model training process, character rectangular frame detection, character segmentation detection and character recognition all participate in loss calculation, and the performance is improved through multitasking training.
The character detection and character recognition multiplexing feature extractor can effectively improve recognition efficiency;
the problem that the recognition result is affected due to the loss of character part characteristics can be solved by utilizing a character region cutting module with a self-adaptive expansion function;
the influence of redundant character areas in the cut character feature area of interest can be relieved by utilizing a character key point detection module with an attention mechanism;
based on the detected text key points, the text gesture is corrected to the horizontal direction by utilizing the thin plate spline interpolation, and the recognition effect is improved.
It should be understood that the above description is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be apparent to those skilled in the art that various modifications, equivalents, variations, and the like can be made to the present invention. However, such modifications are intended to fall within the scope of the present invention without departing from the spirit of the present invention. In addition, some terms used in the specification and claims of the present application are not limiting, but are merely for convenience of description.
Claims (8)
1. An end-to-end commodity price tag character recognition method with gesture correction is characterized by comprising the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
s6, performing literal processing on the feature map to be processed to obtain corresponding words;
in the step S4, the key point detection processing is performed on the text feature map by using the key point detection with the attention mechanism to obtain a plurality of key points surrounding the text feature map of interest;
in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature area actually required to be used in the text feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by an attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
2. The method for recognizing the characters of the end-to-end commodity price tag with gesture correction according to claim 1, wherein in the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features so as to obtain the multi-dimensional feature map.
3. The end-to-end commodity price tag character recognition method with gesture correction according to claim 1, wherein in the step S2, the character suggestion region and the circumscribed rectangular frame position thereof are obtained by performing the region selection processing on the feature map by using an RPN network.
4. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 1, wherein in said step S3, the specific steps of said dividing process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation change image, wherein the attribute probability information is used for indicating whether the pixel point is a character and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and judging whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, reserving the segmentation probability image;
and if the judgment result is negative, deleting the segmentation probability image.
5. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 4, wherein in said step S3, the specific step of said graphic expansion process comprises:
and step S34, performing outward expansion on the segmentation probability image according to the length and width dimensions of the segmentation probability image and a preset proportion to obtain the segmentation probability image after outward expansion and a peripheral part image surrounding the segmentation probability image after outward expansion as the character feature map.
6. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 1, wherein in said step S6, the specific steps of said text processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
7. An end-to-end commodity price tag word recognition system with gesture correction, capable of implementing the end-to-end commodity price tag word recognition method according to any one of claims 1 to 6, comprising:
the feature extraction module is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images;
the character region cutting module is used for carrying out region selection processing on the feature map to obtain a character suggestion region, carrying out segmentation processing on the character suggestion region to obtain a processed character suggestion region, and carrying out graphic expansion processing on the processed character suggestion region to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module is used for literaling the feature map to be processed to obtain corresponding words.
8. The end-to-end commodity price tag word recognition system with gesture correction according to claim 7, wherein said system performs commodity price tag word recognition based on a preset process model, and updates and optimizes said process model according to the recognition process and recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911273581.5A CN111079749B (en) | 2019-12-12 | 2019-12-12 | End-to-end commodity price tag character recognition method and system with gesture correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911273581.5A CN111079749B (en) | 2019-12-12 | 2019-12-12 | End-to-end commodity price tag character recognition method and system with gesture correction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111079749A CN111079749A (en) | 2020-04-28 |
CN111079749B true CN111079749B (en) | 2023-12-22 |
Family
ID=70314044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911273581.5A Active CN111079749B (en) | 2019-12-12 | 2019-12-12 | End-to-end commodity price tag character recognition method and system with gesture correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079749B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112241739A (en) * | 2020-12-17 | 2021-01-19 | 北京沃东天骏信息技术有限公司 | Method, device, equipment and computer readable medium for identifying text errors |
CN115063814B (en) * | 2022-08-22 | 2022-12-23 | 深圳爱莫科技有限公司 | Universal commodity price tag image identification method and processing equipment |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8694433B1 (en) * | 2008-06-26 | 2014-04-08 | Bank Of America Corporation | Image cashletter processing with reject repair deferral |
CN107016387A (en) * | 2016-01-28 | 2017-08-04 | 苏宁云商集团股份有限公司 | A kind of method and device for recognizing label |
CN108229490A (en) * | 2017-02-23 | 2018-06-29 | 北京市商汤科技开发有限公司 | Critical point detection method, neural network training method, device and electronic equipment |
CN108647553A (en) * | 2018-05-10 | 2018-10-12 | 上海扩博智能技术有限公司 | Rapid expansion method, system, equipment and the storage medium of model training image |
CN109284738A (en) * | 2018-10-25 | 2019-01-29 | 上海交通大学 | Irregular face antidote and system |
CN109636815A (en) * | 2018-12-19 | 2019-04-16 | 东北大学 | A kind of metal plate and belt Product labelling information identifying method based on computer vision |
CN109886978A (en) * | 2019-02-20 | 2019-06-14 | 贵州电网有限责任公司 | A kind of end-to-end warning information recognition methods based on deep learning |
CN110070536A (en) * | 2019-04-24 | 2019-07-30 | 南京邮电大学 | A kind of pcb board component detection method based on deep learning |
CN110084240A (en) * | 2019-04-24 | 2019-08-02 | 网易(杭州)网络有限公司 | A kind of Word Input system, method, medium and calculate equipment |
CN110163059A (en) * | 2018-10-30 | 2019-08-23 | 腾讯科技(深圳)有限公司 | More people's gesture recognition methods, device and electronic equipment |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110321894A (en) * | 2019-04-23 | 2019-10-11 | 浙江工业大学 | A kind of library book method for rapidly positioning based on deep learning OCR |
CN110348439A (en) * | 2019-07-02 | 2019-10-18 | 创新奇智(南京)科技有限公司 | A kind of method, computer-readable medium and the system of automatic identification price tag |
CN110516670A (en) * | 2019-08-26 | 2019-11-29 | 广西师范大学 | Suggested based on scene grade and region from the object detection method for paying attention to module |
-
2019
- 2019-12-12 CN CN201911273581.5A patent/CN111079749B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8694433B1 (en) * | 2008-06-26 | 2014-04-08 | Bank Of America Corporation | Image cashletter processing with reject repair deferral |
CN107016387A (en) * | 2016-01-28 | 2017-08-04 | 苏宁云商集团股份有限公司 | A kind of method and device for recognizing label |
CN108229490A (en) * | 2017-02-23 | 2018-06-29 | 北京市商汤科技开发有限公司 | Critical point detection method, neural network training method, device and electronic equipment |
CN108647553A (en) * | 2018-05-10 | 2018-10-12 | 上海扩博智能技术有限公司 | Rapid expansion method, system, equipment and the storage medium of model training image |
CN109284738A (en) * | 2018-10-25 | 2019-01-29 | 上海交通大学 | Irregular face antidote and system |
CN110163059A (en) * | 2018-10-30 | 2019-08-23 | 腾讯科技(深圳)有限公司 | More people's gesture recognition methods, device and electronic equipment |
CN109636815A (en) * | 2018-12-19 | 2019-04-16 | 东北大学 | A kind of metal plate and belt Product labelling information identifying method based on computer vision |
CN109886978A (en) * | 2019-02-20 | 2019-06-14 | 贵州电网有限责任公司 | A kind of end-to-end warning information recognition methods based on deep learning |
CN110321894A (en) * | 2019-04-23 | 2019-10-11 | 浙江工业大学 | A kind of library book method for rapidly positioning based on deep learning OCR |
CN110070536A (en) * | 2019-04-24 | 2019-07-30 | 南京邮电大学 | A kind of pcb board component detection method based on deep learning |
CN110084240A (en) * | 2019-04-24 | 2019-08-02 | 网易(杭州)网络有限公司 | A kind of Word Input system, method, medium and calculate equipment |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110348439A (en) * | 2019-07-02 | 2019-10-18 | 创新奇智(南京)科技有限公司 | A kind of method, computer-readable medium and the system of automatic identification price tag |
CN110516670A (en) * | 2019-08-26 | 2019-11-29 | 广西师范大学 | Suggested based on scene grade and region from the object detection method for paying attention to module |
Non-Patent Citations (3)
Title |
---|
人工智能在电信实名认证中的关键技术及应用;姚慧;马思研;;电信科学(05);全文 * |
基于深度学习的场景文字检测与识别;白翔;杨明锟;石葆光;廖明辉;;中国科学:信息科学(05);全文 * |
陈巧红 ; 陈翊 ; 李文书 ; 贾宇波 ; .多尺度SE-Xception服装图像分类.浙江大学学报(工学版).(09),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111079749A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723585B (en) | Style-controllable image text real-time translation and conversion method | |
CN110322495B (en) | Scene text segmentation method based on weak supervised deep learning | |
CN108520254B (en) | Text detection method and device based on formatted image and related equipment | |
CN111444919B (en) | Method for detecting text with arbitrary shape in natural scene | |
CN112418216B (en) | Text detection method in complex natural scene image | |
CN111860348A (en) | Deep learning-based weak supervision power drawing OCR recognition method | |
CN112818951B (en) | Ticket identification method | |
CN110287952B (en) | Method and system for recognizing characters of dimension picture | |
CN112989995B (en) | Text detection method and device and electronic equipment | |
CN106127222B (en) | A kind of the similarity of character string calculation method and similitude judgment method of view-based access control model | |
CN111079749B (en) | End-to-end commodity price tag character recognition method and system with gesture correction | |
CN111144411B (en) | Irregular text correction and identification method and system based on saliency map | |
CN114155527A (en) | Scene text recognition method and device | |
CN112733858B (en) | Image character rapid identification method and device based on character region detection | |
CN111275040A (en) | Positioning method and device, electronic equipment and computer readable storage medium | |
CN113205041A (en) | Structured information extraction method, device, equipment and storage medium | |
CN114741553B (en) | Image feature-based picture searching method | |
CN109508716B (en) | Image character positioning method and device | |
CN111104924A (en) | Processing algorithm for effectively identifying low-resolution commodity image | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
CN112380978A (en) | Multi-face detection method, system and storage medium based on key point positioning | |
CN114694133B (en) | Text recognition method based on combination of image processing and deep learning | |
CN111274863A (en) | Text prediction method based on text peak probability density | |
CN110991440A (en) | Pixel-driven mobile phone operation interface text detection method | |
CN114494678A (en) | Character recognition method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |