CN111079749B - End-to-end commodity price tag character recognition method and system with gesture correction - Google Patents

End-to-end commodity price tag character recognition method and system with gesture correction Download PDF

Info

Publication number
CN111079749B
CN111079749B CN201911273581.5A CN201911273581A CN111079749B CN 111079749 B CN111079749 B CN 111079749B CN 201911273581 A CN201911273581 A CN 201911273581A CN 111079749 B CN111079749 B CN 111079749B
Authority
CN
China
Prior art keywords
feature map
character
text
processing
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911273581.5A
Other languages
Chinese (zh)
Other versions
CN111079749A (en
Inventor
秦永强
张发恩
高达辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ainnovation Chongqing Technology Co ltd
Original Assignee
Ainnovation Chongqing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ainnovation Chongqing Technology Co ltd filed Critical Ainnovation Chongqing Technology Co ltd
Priority to CN201911273581.5A priority Critical patent/CN111079749B/en
Publication of CN111079749A publication Critical patent/CN111079749A/en
Application granted granted Critical
Publication of CN111079749B publication Critical patent/CN111079749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention provides an end-to-end commodity price tag character recognition method and system with gesture correction, which belong to the technical field of computer vision and comprise the following steps: acquiring a commodity price tag image and extracting features to obtain a corresponding feature map; performing region selection processing on the feature map to obtain a text suggestion region; dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map; performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map; performing attitude correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed; and carrying out literal processing on the feature map to be processed to obtain corresponding words. The invention has the beneficial effects that: the robustness and the efficiency of the character recognition of the complex scene can be improved.

Description

End-to-end commodity price tag character recognition method and system with gesture correction
Technical Field
The invention relates to the field of computer vision, in particular to an end-to-end commodity price tag character recognition method and system with gesture correction.
Background
The commodity price labels in the channel display images are identified through computer vision technology, so that knowledge of commodity price information has become an important solution for managing and controlling the terminal prices of distribution terminals of various fast-selling brand merchants. In the scheme, the quick and accurate identification of commodity price is realized, and the accurate identification of characters on the price tag is key.
Due to the reasons of image shooting angles, commodity price tags in images have the characteristic of random postures, the directions and the postures of characters on the price tags are uncertain, and great difficulty is brought to accurate identification of the characters. In addition, commodity price identification based on computer vision technology generally has high effectiveness requirements, and near real-time identification speed is required. However, the number of tags in a single channel display image is typically high (typically up to tens), and the number of text fields on a single tag is also typically up to tens, which presents a significant challenge for recognition speed.
Most of the existing character recognition schemes adopt an algorithm scheme of character detection, gesture correction and character recognition, firstly, the character detection algorithm is utilized to locate the position of characters, then, a character image area is cut, gesture correction (affine transformation, perspective transformation and the like) is carried out on the character image through an image processing technology, and then, the character recognition algorithm is utilized to carry out recognition. The method realizes character recognition step by step through a plurality of stages, and mainly has two defects:
1) Inefficiency of recognition
The character detection stage and the character recognition stage can perform feature extraction on the same image area, so that repeated calculation is caused. The calculation amount of the feature extraction stage often occupies most of the total calculation amount, which results in particularly long commodity price identification time of a single channel display image, usually reaching identification time of tens of seconds to minutes, and being difficult to meet the real-time requirement.
2) Algorithm robustness is not enough
Text recognition is typically performed after gesture correction. The existing gesture correction algorithm is basically carried out after a strict area (such as any quadrilateral or rotary rectangular frame area) of a character is determined, all areas (including interference information) of an input character image participate in character recognition after gesture correction, and the problems of character information loss (less-framed part of character areas) and interference information increase (more-framed part of character areas) caused by inaccurate character areas cannot be corrected, namely, the positioning accuracy of the character frame is relatively sensitive, and the robustness is insufficient.
In order to improve the robustness of a character recognition algorithm to the gesture, the prior art provides a character recognition algorithm with gesture correction, and a space conversion module is added in an algorithm model, so that the character recognition with different gestures is realized by selecting an effective character area in an input image to carry out gesture correction based on a plurality of key points predicted by the model, and the character recognition algorithm is insensitive to redundant interference information of the input character image, so that a better effect is obtained. However, the cut text segment image is still required to be used as input, text features can be repeatedly extracted, and end-to-end training can not be realized together with text detection.
In the aspect of end-to-end character recognition, a great deal of work is carried out in a large number of documents, wherein most of the work still adopts a scheme of multi-stage combined training, the character recognition algorithm further proposed in the prior art directly cuts out the character region of interest on the feature map to carry out character recognition, repeated extraction of features is avoided, and meanwhile, multi-task training can be utilized to promote each other, but character gesture correction is not considered. In the prior art, the attitude correction is further performed by performing radial transformation correction on the segmented character feature region of interest, which cannot correct more complicated attitudes such as perspective states, and the problem of losing information of the character region (less-framed part of the effective character region) cannot be solved.
Disclosure of Invention
The invention aims to provide an end-to-end commodity price tag character recognition method with gesture correction, which is applied to channel display, scene character recognition and similar scenes and can improve the robustness and efficiency of complex scene character recognition.
To achieve the purpose, the invention adopts the following technical scheme:
provided is an algorithm model training method, comprising:
the end-to-end commodity price tag character recognition method with gesture correction comprises the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
and S6, performing literal processing on the feature map to be processed to obtain corresponding words.
In the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features and obtain the multidimensional feature map.
In the step S2, the character suggestion area and the position of the circumscribed rectangle frame are obtained by carrying out the area selection processing on the feature map by utilizing an RPN network.
As a preferred solution of the end-to-end commodity price tag text recognition method with gesture correction, in the step S3, the specific steps of the segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and judging whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, reserving the segmentation probability image;
and if the judgment result is negative, deleting the segmentation probability image.
As a preferred scheme of the character recognition method for the end-to-end commodity price tag with gesture correction, in the step S3, the specific steps of the graphic expansion processing include:
and step S34, performing outward expansion on the segmentation probability image according to the length and width dimensions of the segmentation probability image and a preset proportion to obtain the segmentation probability image after outward expansion and a peripheral part image surrounding the segmentation probability image after outward expansion as the character feature map.
As a preferred embodiment of the method for recognizing the characters of the end-to-end commodity price tag with gesture correction, in the step S4, the key point detection process is performed on the character feature map by using the key point detection with the attention mechanism to obtain a plurality of key points surrounding the concerned character feature map.
In the step S5, according to the plurality of key points and by using thin plate spline interpolation, a feature area actually required to be used in the character feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by a attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
As a preferred embodiment of the method for recognizing characters of end-to-end commodity price tags with gesture correction, in the step S6, the specific steps of the word processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
The invention also provides an end-to-end commodity price tag character recognition system with gesture correction, which can realize the end-to-end commodity price tag character recognition method, and comprises the following steps:
the feature extraction module is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images;
the character region cutting module is used for carrying out region selection processing on the feature map to obtain a character suggestion region, carrying out segmentation processing on the character suggestion region to obtain a processed character suggestion region, and carrying out graphic expansion processing on the processed character suggestion region to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module is used for literaling the feature map to be processed to obtain corresponding words.
As a preferable scheme of the end-to-end commodity price tag character recognition system with gesture correction, the system performs commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to a recognition process and a recognition result.
The invention has the beneficial effects that: after extracting the feature map from the commodity price tag image, directly processing the feature map to obtain a processed text suggestion region for subsequent text processing, and only carrying out feature extraction once, thereby effectively improving the text recognition efficiency;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain a processed character suggestion area containing effective character fields, and graphic expansion processing is carried out to obtain a character feature map, so that the problem that recognition results are affected due to the fact that part of character features are lost is solved, and the robustness and the efficiency of character recognition of complex scenes are improved;
and performing key point detection on the character preferential total energy diagram to obtain a plurality of key points surrounding the character feature diagram, adjusting the character gesture corresponding to the character feature diagram to the horizontal direction based on the key points by utilizing thin plate spline interpolation to obtain a feature diagram to be processed with a fixed size and horizontally, identifying characters in different directions and curves, and improving the robustness and the efficiency of character identification of a complex scene.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of a method for end-to-end commodity price tag text identification with gesture correction according to an embodiment of the present invention.
FIG. 2 is a flow chart of step S3 according to another embodiment of the present invention;
FIG. 3 is a flowchart of step S6 according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of functional modules of an end-to-end commodity price tag text recognition system with gesture correction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to be limiting of the present patent; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if the terms "upper", "lower", "left", "right", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, only for convenience in describing the present invention and simplifying the description, rather than indicating or implying that the apparatus or elements being referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and should not be construed as limiting the present patent, and that the specific meaning of the terms described above may be understood by those of ordinary skill in the art according to specific circumstances.
In the description of the present invention, unless explicitly stated and limited otherwise, the term "coupled" or the like should be interpreted broadly, as it may be fixedly coupled, detachably coupled, or integrally formed, as indicating the relationship of components; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between the two parts or interaction relationship between the two parts. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
As shown in fig. 1, the method for recognizing the price tag words of the end-to-end commodity with gesture correction provided by the embodiment of the invention comprises the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
and S6, performing literal processing on the feature map to be processed to obtain corresponding words.
In the embodiment, after the feature map is extracted from the commodity price tag image, the feature map is directly processed to obtain the processed text suggestion region for subsequent text processing, and only one feature extraction is needed, so that the text recognition efficiency is effectively improved;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain a processed character suggestion area containing effective character fields, and graphic expansion processing is carried out to obtain a character feature map, so that the problem that recognition results are affected due to the fact that part of character features are lost is solved, and the robustness and the efficiency of character recognition of complex scenes are improved;
and performing key point detection on the character feature map to obtain a plurality of key points surrounding the character feature map, adjusting the character gesture corresponding to the character feature map to the horizontal direction based on the key points by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and in a horizontal state, and can identify characters in different directions and curves, so that the robustness and the efficiency of character identification of complex scenes are improved.
Further, in the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract text features so as to obtain the multi-dimensional feature map.
Further, in the step S2, the region selection process is performed on the feature map by using an RPN network to obtain the text suggestion region and the position of the circumscribed rectangular frame thereof.
Specifically, a regression branch is utilized to obtain the position of the circumscribed rectangle frame of the text suggestion region,
as shown in fig. 2, in the step S3, the specific steps of the segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character or not and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixels with characters as attributes in the segmentation probability images, and determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, the segmentation probability image is reserved;
if the judgment result is negative, deleting the segmentation probability image.
Specifically, whether each pixel point is a segmentation map of a character or not and a probability map corresponding to the segmentation map are obtained by using another segmentation branch (the segmentation map and the probability map are collectively called as a segmentation probability image);
and then calculating the average score of the text suggestion areas according to the probability value scores of the pixels belonging to the text in each text suggestion area, and reserving the text suggestion areas with the scores higher than a certain threshold value.
Further, in the step S3, the specific steps of the pattern expansion process include:
as shown in fig. 2, step S34 is to expand the segmentation probability image according to a preset ratio according to the length and width of the segmentation probability image, so as to obtain the expanded segmentation probability image and a peripheral part image surrounding the expanded segmentation probability image as the text feature map.
Specifically, according to the length and width dimensions of the text suggestion region, a certain proportion of expansion is performed, and then the expanded text suggestion region (i.e. the text feature map) is cut and input to the next stage.
Further, in the step S4, the keyword detection process is performed on the text feature map by using the keyword detection with attention mechanism to obtain the plurality of the keywords surrounding the text feature map of attention.
Specifically, according to the characteristics of the cut text suggestion region (i.e. the text feature map), a key point detection network with a attention mechanism is utilized to detect k key points surrounding the concerned text feature map.
Further, in the step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature area actually required to be used in the text feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by a attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
Specifically, according to k key points, a feature map area (namely a text feature map) of interest is transformed into a horizontal feature area with a fixed size by utilizing thin-plate spline interpolation;
as shown in fig. 3, in the step S6, the specific steps of the word processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
Specifically, an encoder +LSTM +intent is then used to identify the corresponding text.
As shown in fig. 4, an end-to-end commodity price tag text recognition system with gesture correction, comprising:
the feature extraction module 1 is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images, mainly based on the input commodity price tag images, extracting character features by using a convolutional neural network, and outputting a multidimensional feature image;
the text region cutting module 2 is used for carrying out region selection processing on the feature map to obtain a text suggestion region, carrying out segmentation processing on the text suggestion region to obtain a processed text suggestion region, and carrying out graphic expansion processing on the processed text suggestion region to obtain a text feature map;
the key point detection module 3 is used for performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module 4 is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module 5 is used for literaling the feature map to be processed to obtain corresponding words.
Further, the text region cutting module 2 includes:
a text region suggesting unit 21 for obtaining the position of the circumscribed rectangular frame of the text region suggested by the text region using an RPN network based on the extracted feature map;
an nms unit 22 for performing a de-duplication process on the obtained text suggestion region;
an up-sampling unit 23 for transforming the low resolution feature into the high resolution feature so as to divide the text region later;
the segmentation unit 24 performs pixel-by-pixel segmentation according to the feature map obtained by the vegetable sample loading unit, and determines whether each pixel belongs to a text region and the probability thereof;
a score calculating unit 25 that calculates, for each text suggestion region, an average probability of all pixels belonging to the text contained therein as a score of the text suggestion region;
a character region cutting unit 26 for performing outward expansion according to a certain proportion of the length and width of each character suggestion region with the score higher than a certain threshold value obtained in the previous process, and cutting a feature map containing the character suggestion region and the peripheral part region thereof as a character feature map input to the next stage; wherein the expansion scale factor is inversely proportional to the size of the text suggestion region.
Further, the key point detection module 3 detects peripheral key points of the concerned text region in the input text feature map so as to restrict the feature region actually needed to be used, mainly for filtering irrelevant interference feature information. Because the entered text feature map may contain partial feature information for other text fields surrounding the text segment of interest. The key point detection module 3 includes:
a first attention unit 31 that calculates an attention parameter for controlling a region of interest at the time of keypoint prediction;
a key point detection unit 32 for converting the input feature map to an output feature map of a fixed size by thin-plate spline interpolation based on the obtained key points;
further, the literal module 5 includes:
a coding unit 51 for coding and converting the feature map with fixed size into a feature sequence with fixed length;
a second attention unit 52 and a BLSTM unit 53, with which output features are calculated;
the decoding unit 54 transcribes the output features into intelligible text.
Further, the system carries out commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to the recognition process and the recognition result. In the model training process, character rectangular frame detection, character segmentation detection and character recognition all participate in loss calculation, and the performance is improved through multitasking training.
The character detection and character recognition multiplexing feature extractor can effectively improve recognition efficiency;
the problem that the recognition result is affected due to the loss of character part characteristics can be solved by utilizing a character region cutting module with a self-adaptive expansion function;
the influence of redundant character areas in the cut character feature area of interest can be relieved by utilizing a character key point detection module with an attention mechanism;
based on the detected text key points, the text gesture is corrected to the horizontal direction by utilizing the thin plate spline interpolation, and the recognition effect is improved.
It should be understood that the above description is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be apparent to those skilled in the art that various modifications, equivalents, variations, and the like can be made to the present invention. However, such modifications are intended to fall within the scope of the present invention without departing from the spirit of the present invention. In addition, some terms used in the specification and claims of the present application are not limiting, but are merely for convenience of description.

Claims (8)

1. An end-to-end commodity price tag character recognition method with gesture correction is characterized by comprising the following steps:
s1, acquiring a commodity price tag image and extracting features to obtain a corresponding feature map;
s2, carrying out region selection processing on the feature map to obtain a text suggestion region;
s3, dividing the text suggestion region to obtain a processed text suggestion region, and performing graphic expansion processing on the processed text suggestion region to obtain a text feature map;
s4, performing key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
s5, carrying out posture correction processing on the character feature map according to the plurality of key points and by utilizing thin plate spline interpolation to obtain a feature map to be processed with a fixed size and a horizontal feature map to be processed;
s6, performing literal processing on the feature map to be processed to obtain corresponding words;
in the step S4, the key point detection processing is performed on the text feature map by using the key point detection with the attention mechanism to obtain a plurality of key points surrounding the text feature map of interest;
in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature area actually required to be used in the text feature map is constrained, irrelevant disturbance feature information is filtered to obtain the feature map to be processed, the feature area actually required to be used is a valid text field concerned by an attention mechanism, irrelevant disturbance feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature area with a fixed size.
2. The method for recognizing the characters of the end-to-end commodity price tag with gesture correction according to claim 1, wherein in the step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features so as to obtain the multi-dimensional feature map.
3. The end-to-end commodity price tag character recognition method with gesture correction according to claim 1, wherein in the step S2, the character suggestion region and the circumscribed rectangular frame position thereof are obtained by performing the region selection processing on the feature map by using an RPN network.
4. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 1, wherein in said step S3, the specific steps of said dividing process include:
step S31, carrying out de-duplication processing and up-sampling processing on the text suggestion region to obtain at least one high-resolution region, wherein the resolution of the high-resolution region is higher than that of the text suggestion region;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation change image, wherein the attribute probability information is used for indicating whether the pixel point is a character and a probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and judging whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold value or not:
if the judgment result is yes, reserving the segmentation probability image;
and if the judgment result is negative, deleting the segmentation probability image.
5. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 4, wherein in said step S3, the specific step of said graphic expansion process comprises:
and step S34, performing outward expansion on the segmentation probability image according to the length and width dimensions of the segmentation probability image and a preset proportion to obtain the segmentation probability image after outward expansion and a peripheral part image surrounding the segmentation probability image after outward expansion as the character feature map.
6. The method for recognizing end-to-end commodity price tag text with posture correction according to claim 1, wherein in said step S6, the specific steps of said text processing include:
step S61, performing code conversion processing on the feature image to be processed to obtain a feature sequence with a fixed length;
step S62, calculating output features of a feature sequence with a fixed length by using an attention mechanism and the BLSTM;
and step S63, decoding the output characteristics to obtain the intelligible characters.
7. An end-to-end commodity price tag word recognition system with gesture correction, capable of implementing the end-to-end commodity price tag word recognition method according to any one of claims 1 to 6, comprising:
the feature extraction module is used for acquiring commodity price tag images and extracting features to obtain corresponding feature images;
the character region cutting module is used for carrying out region selection processing on the feature map to obtain a character suggestion region, carrying out segmentation processing on the character suggestion region to obtain a processed character suggestion region, and carrying out graphic expansion processing on the processed character suggestion region to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature map to obtain a plurality of key points surrounding the character feature map;
the gesture correction module is used for carrying out gesture correction processing on the character feature map to obtain a feature map to be processed according to the plurality of key points and by utilizing thin plate spline interpolation;
and the literal module is used for literaling the feature map to be processed to obtain corresponding words.
8. The end-to-end commodity price tag word recognition system with gesture correction according to claim 7, wherein said system performs commodity price tag word recognition based on a preset process model, and updates and optimizes said process model according to the recognition process and recognition result.
CN201911273581.5A 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction Active CN111079749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911273581.5A CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911273581.5A CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Publications (2)

Publication Number Publication Date
CN111079749A CN111079749A (en) 2020-04-28
CN111079749B true CN111079749B (en) 2023-12-22

Family

ID=70314044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911273581.5A Active CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Country Status (1)

Country Link
CN (1) CN111079749B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241739A (en) * 2020-12-17 2021-01-19 北京沃东天骏信息技术有限公司 Method, device, equipment and computer readable medium for identifying text errors
CN115063814B (en) * 2022-08-22 2022-12-23 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694433B1 (en) * 2008-06-26 2014-04-08 Bank Of America Corporation Image cashletter processing with reject repair deferral
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN108229490A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Critical point detection method, neural network training method, device and electronic equipment
CN108647553A (en) * 2018-05-10 2018-10-12 上海扩博智能技术有限公司 Rapid expansion method, system, equipment and the storage medium of model training image
CN109284738A (en) * 2018-10-25 2019-01-29 上海交通大学 Irregular face antidote and system
CN109636815A (en) * 2018-12-19 2019-04-16 东北大学 A kind of metal plate and belt Product labelling information identifying method based on computer vision
CN109886978A (en) * 2019-02-20 2019-06-14 贵州电网有限责任公司 A kind of end-to-end warning information recognition methods based on deep learning
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110084240A (en) * 2019-04-24 2019-08-02 网易(杭州)网络有限公司 A kind of Word Input system, method, medium and calculate equipment
CN110163059A (en) * 2018-10-30 2019-08-23 腾讯科技(深圳)有限公司 More people's gesture recognition methods, device and electronic equipment
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110321894A (en) * 2019-04-23 2019-10-11 浙江工业大学 A kind of library book method for rapidly positioning based on deep learning OCR
CN110348439A (en) * 2019-07-02 2019-10-18 创新奇智(南京)科技有限公司 A kind of method, computer-readable medium and the system of automatic identification price tag
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694433B1 (en) * 2008-06-26 2014-04-08 Bank Of America Corporation Image cashletter processing with reject repair deferral
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN108229490A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Critical point detection method, neural network training method, device and electronic equipment
CN108647553A (en) * 2018-05-10 2018-10-12 上海扩博智能技术有限公司 Rapid expansion method, system, equipment and the storage medium of model training image
CN109284738A (en) * 2018-10-25 2019-01-29 上海交通大学 Irregular face antidote and system
CN110163059A (en) * 2018-10-30 2019-08-23 腾讯科技(深圳)有限公司 More people's gesture recognition methods, device and electronic equipment
CN109636815A (en) * 2018-12-19 2019-04-16 东北大学 A kind of metal plate and belt Product labelling information identifying method based on computer vision
CN109886978A (en) * 2019-02-20 2019-06-14 贵州电网有限责任公司 A kind of end-to-end warning information recognition methods based on deep learning
CN110321894A (en) * 2019-04-23 2019-10-11 浙江工业大学 A kind of library book method for rapidly positioning based on deep learning OCR
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110084240A (en) * 2019-04-24 2019-08-02 网易(杭州)网络有限公司 A kind of Word Input system, method, medium and calculate equipment
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110348439A (en) * 2019-07-02 2019-10-18 创新奇智(南京)科技有限公司 A kind of method, computer-readable medium and the system of automatic identification price tag
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
人工智能在电信实名认证中的关键技术及应用;姚慧;马思研;;电信科学(05);全文 *
基于深度学习的场景文字检测与识别;白翔;杨明锟;石葆光;廖明辉;;中国科学:信息科学(05);全文 *
陈巧红 ; 陈翊 ; 李文书 ; 贾宇波 ; .多尺度SE-Xception服装图像分类.浙江大学学报(工学版).(09),全文. *

Also Published As

Publication number Publication date
CN111079749A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN108520254B (en) Text detection method and device based on formatted image and related equipment
CN111444919B (en) Method for detecting text with arbitrary shape in natural scene
CN112418216B (en) Text detection method in complex natural scene image
CN111860348A (en) Deep learning-based weak supervision power drawing OCR recognition method
CN112818951B (en) Ticket identification method
CN110287952B (en) Method and system for recognizing characters of dimension picture
CN112989995B (en) Text detection method and device and electronic equipment
CN106127222B (en) A kind of the similarity of character string calculation method and similitude judgment method of view-based access control model
CN111079749B (en) End-to-end commodity price tag character recognition method and system with gesture correction
CN111144411B (en) Irregular text correction and identification method and system based on saliency map
CN114155527A (en) Scene text recognition method and device
CN112733858B (en) Image character rapid identification method and device based on character region detection
CN111275040A (en) Positioning method and device, electronic equipment and computer readable storage medium
CN113205041A (en) Structured information extraction method, device, equipment and storage medium
CN114741553B (en) Image feature-based picture searching method
CN109508716B (en) Image character positioning method and device
CN111104924A (en) Processing algorithm for effectively identifying low-resolution commodity image
CN111832497B (en) Text detection post-processing method based on geometric features
CN112380978A (en) Multi-face detection method, system and storage medium based on key point positioning
CN114694133B (en) Text recognition method based on combination of image processing and deep learning
CN111274863A (en) Text prediction method based on text peak probability density
CN110991440A (en) Pixel-driven mobile phone operation interface text detection method
CN114494678A (en) Character recognition method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant