CN111079749A - End-to-end commodity price tag character recognition method and system with attitude correction function - Google Patents

End-to-end commodity price tag character recognition method and system with attitude correction function Download PDF

Info

Publication number
CN111079749A
CN111079749A CN201911273581.5A CN201911273581A CN111079749A CN 111079749 A CN111079749 A CN 111079749A CN 201911273581 A CN201911273581 A CN 201911273581A CN 111079749 A CN111079749 A CN 111079749A
Authority
CN
China
Prior art keywords
character
processing
commodity price
feature map
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911273581.5A
Other languages
Chinese (zh)
Other versions
CN111079749B (en
Inventor
秦永强
张发恩
高达辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ainnovation Chongqing Technology Co ltd
Original Assignee
Ainnovation Chongqing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ainnovation Chongqing Technology Co ltd filed Critical Ainnovation Chongqing Technology Co ltd
Priority to CN201911273581.5A priority Critical patent/CN111079749B/en
Publication of CN111079749A publication Critical patent/CN111079749A/en
Application granted granted Critical
Publication of CN111079749B publication Critical patent/CN111079749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an end-to-end commodity price tag character recognition method with posture correction and a system thereof, belonging to the technical field of computer vision and comprising the following steps: acquiring a commodity price tag image and extracting characteristics to obtain a corresponding characteristic diagram; carrying out region selection processing on the feature map to obtain a character suggestion region; segmenting the character suggestion area to obtain a processed character suggestion area, and performing graphic expansion processing on the processed character suggestion area to obtain a character feature map; carrying out key point detection processing on the character feature diagram to obtain a plurality of key points surrounding the character feature diagram; carrying out posture correction processing on the character characteristic graph by utilizing thin plate spline interpolation according to a plurality of key points to obtain a characteristic graph to be processed with fixed size and level; and performing the word processing on the characteristic diagram to be processed to obtain corresponding words. The invention has the beneficial effects that: the robustness and efficiency of complex scene character recognition can be improved.

Description

End-to-end commodity price tag character recognition method and system with attitude correction function
Technical Field
The invention relates to the field of computer vision, in particular to an end-to-end commodity price tag character recognition method with posture correction and a system thereof.
Background
The commodity price label in the channel display image is identified through a computer vision technology, so that the knowledge of commodity price information becomes an important solution for each fast selling brand merchant to control the price of the distribution terminal. In the scheme, the accurate identification of the characters on the price tag is the key to realize the quick and accurate identification of the commodity price.
Due to the image shooting angle, the commodity price tag in the image has the characteristic of any posture, and the direction and the posture of characters on the price tag are uncertain, so that great difficulty is brought to accurate recognition of the characters. In addition, commodity price identification based on computer vision technology generally has high effectiveness requirement, and the identification speed which can be close to real time is required. However, the number of price tags in a single channel display image is typically high (typically up to tens), and the text fields on a single price tag are typically up to tens, which presents a significant challenge to speed of identification.
Most of the existing character recognition schemes adopt an algorithm scheme of character detection, posture correction and character recognition, firstly, the position of a character is positioned by utilizing a character detection algorithm, then, a character image area is cut, the posture correction (affine transformation, perspective transformation and the like) is carried out on a character image by an image processing technology, and then, the character recognition algorithm is used for recognition. The method gradually realizes character recognition through a plurality of stages, and has two main defects:
1) low recognition efficiency
Both the text detection stage and the text recognition stage perform feature extraction on the same image region, resulting in repeated calculation. The calculation amount of the feature extraction stage usually accounts for most of the total calculation amount, so that the commodity price identification time of a single channel display image is very long, the identification time can reach dozens of seconds to minutes usually, and the real-time requirement is difficult to meet.
2) The algorithm is not robust enough
Character recognition is typically performed after the pose correction. The existing posture correction algorithm is basically carried out after a strict region of a character (such as an arbitrary quadrilateral or rotating rectangular frame region) is determined, all regions (including interference information) of an input character image participate in character recognition after posture correction, and the problems of character information loss (few frame part character regions) and interference information increase (many frame part character regions) caused by inaccurate character regions cannot be corrected, namely, the positioning accuracy of a character frame is sensitive and the robustness is insufficient.
In order to improve the robustness of a character recognition algorithm to the posture, the prior art provides a character recognition algorithm with posture correction, a space conversion module is added in an algorithm model, an effective character area in an input image is selected to perform posture correction based on a plurality of key points predicted by the model, so that character recognition of different postures is realized, the method is insensitive to redundant interference information of the input character image, and a better effect is achieved. However, the cut text segment image is still required to be used as input, text features can be repeatedly extracted, and end-to-end training cannot be realized together with text detection.
In the aspect of end-to-end character recognition, a great deal of work is also carried out in a large number of documents, most of the work still adopts a scheme of multi-stage combined training, an end-to-end character recognition algorithm further proposed in the prior art directly cuts out an interested character area on a feature map for character recognition, repeated feature extraction is avoided, meanwhile, multi-task training can be used for mutual promotion, but character posture correction is not considered. In the prior art, the pose correction is further performed by performing radiation transformation correction on the cut-out interested character feature region, which cannot correct more complicated poses such as perspective state and the like, and cannot solve the problem of character region information loss (less framing of part of effective character regions).
Disclosure of Invention
The invention aims to provide an end-to-end commodity price tag character recognition method with posture correction, which is applied to channel display, scene character recognition and similar scenes and can improve the robustness and efficiency of complex scene character recognition.
In order to achieve the purpose, the invention adopts the following technical scheme:
an algorithm model training method is provided, which comprises the following steps:
the end-to-end commodity price tag character recognition method with posture correction comprises the following steps:
s1, acquiring a commodity price tag image and performing feature extraction to obtain a corresponding feature map;
step S2, carrying out area selection processing on the feature map to obtain a character suggestion area;
step S3, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;
step S4, carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;
step S5, according to the key points and by means of thin plate spline interpolation, carrying out posture correction processing on the character feature graph to obtain a feature graph to be processed with fixed size and level;
and step S6, performing the word processing on the characteristic diagram to be processed to obtain corresponding words.
As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features to obtain the feature map with multiple dimensions.
In step S2, the RPN network is used to perform the region selection process on the feature map to obtain the suggested text region and the position of the circumscribed rectangle thereof.
As a preferable scheme of the end-to-end product price tag character recognition method with posture correction, in step S3, the specific steps of the segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the character suggestion area to obtain at least one high-resolution area, wherein the resolution of the high-resolution area is higher than that of the character suggestion area;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character or not and indicating the probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all pixel points with characters as attributes in the segmentation probability images, and respectively determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold:
if the judgment result is yes, the segmentation probability image is reserved;
and if the judgment result is negative, deleting the segmentation probability image.
As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S3, the specific steps of the graph expansion process include:
and step S34, according to the length and width of the segmentation probability image, performing external expansion on the segmentation probability image according to a preset proportion to obtain the segmentation probability image subjected to external expansion and a peripheral partial image surrounding the segmentation probability image subjected to external expansion as the character feature map.
As a preferable configuration of the method for recognizing price tag characters of end-to-end commodities with posture correction, in step S4, the key point detection process is performed on the character feature map by using key point detection with attention mechanism, so as to obtain a plurality of key points surrounding the character feature map of interest.
As a preferable scheme of the method for recognizing price tags of end-to-end commodities with posture correction, in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature region actually required to be used in the character feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.
As a preferable scheme of the end-to-end commodity price tag character recognition method with posture correction, in step S6, the concrete steps of the writing process include:
step S61, carrying out code conversion processing on the characteristic diagram to be processed to obtain a characteristic sequence with fixed length;
step S62, calculating the output characteristics of the characteristic sequence with fixed length by using an attention mechanism and BLSTM;
and step S63, decoding the output characteristics to obtain understandable characters.
The invention also provides an end-to-end commodity price tag character recognition system with posture correction, which can realize the end-to-end commodity price tag character recognition method and comprises the following steps:
the characteristic extraction module is used for acquiring the commodity price tag image and extracting the characteristics to obtain a corresponding characteristic diagram;
the character area cutting module is used for carrying out area selection processing on the feature map to obtain a character suggestion area, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;
the gesture correction module is used for carrying out gesture correction processing on the character feature map according to the plurality of key points by utilizing thin plate spline interpolation to obtain a feature map to be processed;
and the writing module is used for performing writing processing on the characteristic diagram to be processed to obtain corresponding characters.
As a preferred scheme of the end-to-end commodity price tag character recognition system with posture correction, the system carries out commodity price tag character recognition based on a preset processing model, and updates and optimizes the processing model according to the recognition process and the recognition result.
The invention has the beneficial effects that: after the characteristic diagram is extracted from the commodity price tag image, the characteristic diagram is directly processed to obtain a processed character suggestion area for subsequent character processing, only one-time characteristic extraction is needed, and the character recognition efficiency is effectively improved;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain the processed character suggestion area containing the effective text field, and graph expansion processing is carried out to obtain a character feature graph, so that the problem that the recognition result is influenced due to the loss of partial characters of the character is solved, and the robustness and the efficiency of character recognition in a complex scene are improved;
the method comprises the steps of detecting key points of a character preference total energy graph to obtain a plurality of key points surrounding a character characteristic graph, adjusting the character posture corresponding to the character characteristic graph to the horizontal direction by utilizing thin plate spline interpolation based on the key points to obtain a fixed-size and horizontal characteristic graph to be processed, recognizing characters in different directions and in curve shapes, and improving robustness and efficiency of character recognition in a complex scene.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart of an end-to-end commodity price tag character recognition method with posture correction according to an embodiment of the present invention.
Fig. 2 is a flowchart of step S3 according to another embodiment of the present invention;
FIG. 3 is a flowchart of step S6 according to an embodiment of the present invention;
fig. 4 is a functional block diagram of an end-to-end product price tag character recognition system with posture correction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.
In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
As shown in fig. 1, an end-to-end commodity price tag character recognition method with posture correction provided in an embodiment of the present invention includes:
s1, acquiring a commodity price tag image and performing feature extraction to obtain a corresponding feature map;
step S2, carrying out area selection processing on the characteristic diagram to obtain a character suggestion area;
step S3, the character suggestion area is segmented to obtain a processed character suggestion area, and the processed character suggestion area is subjected to graphic expansion to obtain a character feature map;
step S4, carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;
step S5, according to the key points and by using thin plate spline interpolation, carrying out posture correction processing on the character characteristic diagram to obtain a characteristic diagram to be processed with fixed size and level;
and step S6, performing the word processing on the characteristic diagram to be processed to obtain corresponding words.
In the embodiment, after the feature map is extracted from the commodity price tag image, the feature map is directly processed to obtain a processed character suggestion region for subsequent character processing, and only one-time feature extraction is needed, so that the character recognition efficiency is effectively improved;
after the character suggestion area is obtained, character segmentation processing is carried out to obtain the processed character suggestion area containing the effective text field, and graph expansion processing is carried out to obtain a character feature graph, so that the problem that the recognition result is influenced due to the loss of partial characters of the character is solved, and the robustness and the efficiency of character recognition in a complex scene are improved;
the method comprises the steps of detecting key points of a character feature diagram to obtain a plurality of key points surrounding the character feature diagram, adjusting the character posture corresponding to the character feature diagram to the horizontal direction by utilizing thin plate spline interpolation based on the key points to obtain a horizontal feature diagram to be processed with a fixed size, recognizing characters in different directions and in curve shapes, and improving robustness and efficiency of character recognition in a complex scene.
Further, in step S1, feature extraction is performed on the product price label image using a deep learning network to extract character features to obtain the feature map in multiple dimensions.
Further, in the step S2, the RPN network is used to perform the region selection process on the feature map to obtain the character suggestion region and the position of the circumscribed rectangle thereof.
Specifically, a regression branch is used to obtain the position of the circumscribed rectangle frame of the character suggestion region,
as shown in fig. 2, further, in step S3, the dividing process includes:
step S31, carrying out de-duplication processing and up-sampling processing on the character suggestion area to obtain at least one high-resolution area, wherein the resolution of the high-resolution area is higher than that of the character suggestion area;
step S32, performing pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, where the attribute probability information is used to indicate whether the pixel point is a text or not and a probability value of the text;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all the pixel points with characters as attributes in the segmentation probability images, and respectively determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold:
if the judgment result is yes, the segmentation probability image is reserved;
and if the judgment result is negative, deleting the segmentation probability image.
Specifically, another segmentation branch is utilized to obtain a segmentation map and a probability map corresponding to the segmentation map, wherein each pixel point is a character or not (the segmentation map and the probability map are collectively referred to as a segmentation probability image);
and then, calculating the average score of each character suggestion region according to the probability value score of the pixel points belonging to the characters in each character suggestion region, and reserving the character suggestion regions with the scores higher than a certain threshold value.
Further, in step S3, the specific steps of the pattern expansion process include:
as shown in fig. 2, in step S34, the segmentation probability image is expanded according to the length and width of the segmentation probability image and a predetermined ratio, and the segmentation probability image after the expansion and the peripheral partial image surrounding the segmentation probability image after the expansion are obtained as the character feature map.
Specifically, according to the length and width of the character suggestion region, a certain proportion of expansion is performed, and then the expanded character suggestion region (i.e. the character feature map) is cut and input to the next stage.
Further, in step S4, the key point detection process is performed on the character feature map by using a key point detection with attention mechanism to obtain a plurality of key points surrounding the character feature map of interest.
Specifically, according to the cut character suggestion region features (namely character feature maps), a key point detection network with an attention mechanism is utilized to detect k key points surrounding the concerned character feature maps.
Further, in step S5, according to the plurality of key points and by using thin-plate spline interpolation, a feature region actually required to be used in the text feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.
Specifically, according to k key points, an interested feature map region (namely a character feature map) is converted into a horizontal feature region with a fixed size by utilizing thin plate spline interpolation;
as shown in fig. 3, in step S6, the step of converting into text specifically includes:
step S61, carrying out code conversion processing on the characteristic diagram to be processed to obtain a characteristic sequence with fixed length;
step S62, calculating the output characteristics of the characteristic sequence with fixed length by using an attention mechanism and BLSTM;
and step S63, decoding the output characteristics to obtain understandable characters.
Specifically, an encoder + LSTM + authentication is then used to identify the corresponding text.
As shown in fig. 4, an end-to-end article price tag text recognition system with posture correction, comprising:
the characteristic extraction module 1 is used for acquiring a commodity price tag image and extracting characteristics to obtain a corresponding characteristic diagram, mainly extracting character characteristics by utilizing a convolution neural network based on the input commodity price tag image, and outputting a multi-dimensional characteristic diagram;
the character area cutting module 2 is used for carrying out area selection processing on the characteristic graph to obtain a character suggestion area, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graph expansion processing on the processed character suggestion area to obtain a character characteristic graph;
a key point detection module 3, configured to perform key point detection processing on the text feature map to obtain a plurality of key points surrounding the text feature map;
the posture correction module 4 is used for carrying out posture correction processing on the character characteristic diagram according to the key points and by utilizing thin plate spline interpolation to obtain a characteristic diagram to be processed;
and the writing module 5 is used for performing writing processing on the characteristic diagram to be processed to obtain corresponding characters.
Further, the text region cutting module 2 includes:
the character region suggesting unit 21 obtains the position of a circumscribed rectangular frame of the character suggesting region by using an RPN according to the extracted feature map;
an nms unit 22 configured to perform deduplication processing on the obtained text suggestion region;
an upsampling unit 23, which is mainly used to transform the low-resolution features to the high-resolution features so as to perform segmentation processing on the text regions;
the segmentation unit 24 is used for performing pixel-by-pixel segmentation according to the feature map obtained by the dish sample loading unit and determining whether each pixel belongs to a character region and the probability of each pixel;
a score calculation unit 25 that calculates, for each character suggestion region, an average probability of all pixel points belonging to characters contained therein as a score of the character suggestion region;
a text region cutting unit 26, which performs outward expansion according to the length and width of each text suggestion region with score higher than a certain threshold obtained in the foregoing process according to a certain proportion, and cuts a feature map including the text suggestion region and a peripheral region thereof as a text feature map input to the next stage; wherein the expansion scale factor is inversely proportional to the size of the text suggestion region.
Further, the key point detection module 3 detects peripheral key points of a focused text region in the input text feature map to constrain the feature region that actually needs to be used, mainly to filter out irrelevant interference feature information. Because, the input text feature map may contain partial feature information of other text fields around the text segment of interest. The key point detecting module 3 includes:
a first attention unit 31 for calculating an attention parameter for controlling an area of interest when predicting the keypoint;
a key point detection unit 32 for converting the input feature map to an output feature map of a fixed size by using thin-plate spline interpolation according to the obtained key points;
further, the writing module 5 includes:
an encoding unit 51 for encoding and converting the fixed-size feature map into a fixed-size feature sequence;
a second attention unit 52 and a BLSTM unit 53, which are used for calculating to obtain output characteristics;
the decoding unit 54 transcribes the output features into intelligible text.
Furthermore, the system identifies the commodity price tag characters based on a preset processing model, and updates and optimizes the processing model according to the identification process and the identification result. In the model training process, the character rectangular box detection, the character segmentation detection and the character recognition all participate in loss calculation, and the performance is improved through multi-task training.
The character detection and character recognition multiplexing feature extractor can effectively improve the recognition efficiency;
the problem that the recognition result is influenced due to the fact that the characteristics of a character part are lost can be solved by utilizing a character area cutting module with a self-adaptive expansion function;
the influence of redundant character areas in the cut character characteristic areas of interest can be relieved by utilizing a character key point detection module with an attention mechanism;
based on the detected key points of the characters, the gesture of the characters is corrected to the horizontal direction by utilizing thin plate spline interpolation, and the recognition effect is improved.
It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims (10)

1. An end-to-end commodity price tag character recognition method with posture correction is characterized by comprising the following steps:
s1, acquiring a commodity price tag image and performing feature extraction to obtain a corresponding feature map;
step S2, carrying out area selection processing on the feature map to obtain a character suggestion area;
step S3, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;
step S4, carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;
step S5, according to the plurality of key points and by means of thin plate spline interpolation, carrying out posture correction processing on the character feature graph to obtain a feature graph to be processed with fixed size and level;
and step S6, performing the word processing on the characteristic diagram to be processed to obtain corresponding words.
2. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S1, feature extraction is performed on the commodity price tag image by using a deep learning network to extract character features to obtain the feature map with multiple dimensions.
3. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S2, the RPN network is used to perform the region selection process on the feature map to obtain the character suggestion region and the position of the rectangle circumscribing the character suggestion region.
4. The method for recognizing commodity price tags with pose correction according to claim 1, wherein said step S3, the specific steps of said segmentation process include:
step S31, carrying out de-duplication processing and up-sampling processing on the character suggestion area to obtain at least one high-resolution area, wherein the resolution of the high-resolution area is higher than that of the character suggestion area;
step S32, respectively carrying out pixel-by-pixel segmentation processing on each high-resolution area to obtain a segmentation probability image and attribute probability information of each pixel point in the segmentation probability image, wherein the attribute probability information is used for indicating whether the pixel point is a character or not and indicating the probability value of the character;
step S33, performing region score calculation processing on each of the segmentation probability images to obtain an average value of the probability values of all pixel points with characters as attributes in the segmentation probability images, and respectively determining whether the average value corresponding to each of the segmentation probability images is greater than a preset threshold:
if the judgment result is yes, the segmentation probability image is reserved;
and if the judgment result is negative, deleting the segmentation probability image.
5. The method for end-to-end commodity price tag character recognition with posture correction as claimed in claim 4, wherein in said step S3, the specific steps of said graphic externally expanding process include:
and step S34, according to the length and width of the segmentation probability image, performing external expansion on the segmentation probability image according to a preset proportion to obtain the segmentation probability image subjected to external expansion and a peripheral partial image surrounding the segmentation probability image subjected to external expansion as the character feature map.
6. The method for recognizing commodity price tags with pose correction according to claim 1, wherein in step S4, said key point detection processing is performed on said character feature map by using key point detection with attention mechanism to obtain a plurality of key points surrounding said character feature map of interest.
7. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S5, according to the plurality of key points and by using thin-plate spline interpolation, the feature region actually required to be used in the character feature map is constrained, and irrelevant interference feature information is filtered to obtain the feature map to be processed, the feature region actually required to be used is a valid text field concerned by attention mechanism, the irrelevant interference feature information is an invalid text field surrounding the valid text field, and the feature map to be processed is a horizontal feature region with a fixed size.
8. The method for recognizing commodity price tag characters with posture correction according to claim 1, wherein in step S6, the concrete steps of said writing process include:
step S61, carrying out code conversion processing on the characteristic diagram to be processed to obtain a characteristic sequence with fixed length;
step S62, calculating the output characteristics of the characteristic sequence with fixed length by using an attention mechanism and BLSTM;
and step S63, decoding the output characteristics to obtain understandable characters.
9. An end-to-end commodity price tag character recognition system with posture correction, which can realize the end-to-end commodity price tag character recognition method as any one of claims 1 to 8, is characterized by comprising the following steps:
the characteristic extraction module is used for acquiring the commodity price tag image and extracting the characteristics to obtain a corresponding characteristic diagram;
the character area cutting module is used for carrying out area selection processing on the feature map to obtain a character suggestion area, carrying out segmentation processing on the character suggestion area to obtain a processed character suggestion area, and carrying out graphic expansion processing on the processed character suggestion area to obtain a character feature map;
the key point detection module is used for carrying out key point detection processing on the character feature graph to obtain a plurality of key points surrounding the character feature graph;
the gesture correction module is used for carrying out gesture correction processing on the character feature map according to the plurality of key points by utilizing thin plate spline interpolation to obtain a feature map to be processed;
and the writing module is used for performing writing processing on the characteristic diagram to be processed to obtain corresponding characters.
10. The system for end-to-end commodity price tag character recognition with posture correction according to claim 9, wherein said system performs commodity price tag character recognition based on a preset processing model, and updates and optimizes said processing model according to the recognition process and the recognition result.
CN201911273581.5A 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction Active CN111079749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911273581.5A CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911273581.5A CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Publications (2)

Publication Number Publication Date
CN111079749A true CN111079749A (en) 2020-04-28
CN111079749B CN111079749B (en) 2023-12-22

Family

ID=70314044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911273581.5A Active CN111079749B (en) 2019-12-12 2019-12-12 End-to-end commodity price tag character recognition method and system with gesture correction

Country Status (1)

Country Link
CN (1) CN111079749B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241739A (en) * 2020-12-17 2021-01-19 北京沃东天骏信息技术有限公司 Method, device, equipment and computer readable medium for identifying text errors
CN115063814A (en) * 2022-08-22 2022-09-16 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694433B1 (en) * 2008-06-26 2014-04-08 Bank Of America Corporation Image cashletter processing with reject repair deferral
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN108229490A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Critical point detection method, neural network training method, device and electronic equipment
CN108647553A (en) * 2018-05-10 2018-10-12 上海扩博智能技术有限公司 Rapid expansion method, system, equipment and the storage medium of model training image
CN109284738A (en) * 2018-10-25 2019-01-29 上海交通大学 Irregular face antidote and system
CN109636815A (en) * 2018-12-19 2019-04-16 东北大学 A kind of metal plate and belt Product labelling information identifying method based on computer vision
CN109886978A (en) * 2019-02-20 2019-06-14 贵州电网有限责任公司 A kind of end-to-end warning information recognition methods based on deep learning
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110084240A (en) * 2019-04-24 2019-08-02 网易(杭州)网络有限公司 A kind of Word Input system, method, medium and calculate equipment
CN110163059A (en) * 2018-10-30 2019-08-23 腾讯科技(深圳)有限公司 More people's gesture recognition methods, device and electronic equipment
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110321894A (en) * 2019-04-23 2019-10-11 浙江工业大学 A kind of library book method for rapidly positioning based on deep learning OCR
CN110348439A (en) * 2019-07-02 2019-10-18 创新奇智(南京)科技有限公司 A kind of method, computer-readable medium and the system of automatic identification price tag
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694433B1 (en) * 2008-06-26 2014-04-08 Bank Of America Corporation Image cashletter processing with reject repair deferral
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN108229490A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Critical point detection method, neural network training method, device and electronic equipment
CN108647553A (en) * 2018-05-10 2018-10-12 上海扩博智能技术有限公司 Rapid expansion method, system, equipment and the storage medium of model training image
CN109284738A (en) * 2018-10-25 2019-01-29 上海交通大学 Irregular face antidote and system
CN110163059A (en) * 2018-10-30 2019-08-23 腾讯科技(深圳)有限公司 More people's gesture recognition methods, device and electronic equipment
CN109636815A (en) * 2018-12-19 2019-04-16 东北大学 A kind of metal plate and belt Product labelling information identifying method based on computer vision
CN109886978A (en) * 2019-02-20 2019-06-14 贵州电网有限责任公司 A kind of end-to-end warning information recognition methods based on deep learning
CN110321894A (en) * 2019-04-23 2019-10-11 浙江工业大学 A kind of library book method for rapidly positioning based on deep learning OCR
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110084240A (en) * 2019-04-24 2019-08-02 网易(杭州)网络有限公司 A kind of Word Input system, method, medium and calculate equipment
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110348439A (en) * 2019-07-02 2019-10-18 创新奇智(南京)科技有限公司 A kind of method, computer-readable medium and the system of automatic identification price tag
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姚慧;马思研;: "人工智能在电信实名认证中的关键技术及应用", 电信科学, no. 05 *
白翔;杨明锟;石葆光;廖明辉;: "基于深度学习的场景文字检测与识别", 中国科学:信息科学, no. 05 *
陈巧红;陈翊;李文书;贾宇波;: "多尺度SE-Xception服装图像分类", no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241739A (en) * 2020-12-17 2021-01-19 北京沃东天骏信息技术有限公司 Method, device, equipment and computer readable medium for identifying text errors
CN115063814A (en) * 2022-08-22 2022-09-16 深圳爱莫科技有限公司 Universal commodity price tag image identification method and processing equipment

Also Published As

Publication number Publication date
CN111079749B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN108520254B (en) Text detection method and device based on formatted image and related equipment
JP6366024B2 (en) Method and apparatus for extracting text from an imaged document
CN110348439B (en) Method, computer readable medium and system for automatically identifying price tags
CN113971751A (en) Training feature extraction model, and method and device for detecting similar images
CN101957991A (en) Remote sensing image registration method
CN108021863B (en) Electronic device, age classification method based on image and storage medium
CN112989995B (en) Text detection method and device and electronic equipment
CN113139543A (en) Training method of target object detection model, target object detection method and device
CN110991310A (en) Portrait detection method, portrait detection device, electronic equipment and computer readable medium
CN113591566A (en) Training method and device of image recognition model, electronic equipment and storage medium
CN112070649A (en) Method and system for removing specific character string watermark
CN113205041A (en) Structured information extraction method, device, equipment and storage medium
CN111079749B (en) End-to-end commodity price tag character recognition method and system with gesture correction
CN111210417B (en) Cloth defect detection method based on convolutional neural network
CN109508716B (en) Image character positioning method and device
CN108764248B (en) Image feature point extraction method and device
US10055668B2 (en) Method for the optical detection of symbols
WO2021179751A1 (en) Image processing method and system
CN112380978A (en) Multi-face detection method, system and storage medium based on key point positioning
CN111274863A (en) Text prediction method based on text peak probability density
CN110991440A (en) Pixel-driven mobile phone operation interface text detection method
CN114494678A (en) Character recognition method and electronic equipment
CN113610809A (en) Fracture detection method, fracture detection device, electronic device, and storage medium
CN111383193A (en) Image restoration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant