CN114627457A - Ticket information identification method and device - Google Patents

Ticket information identification method and device Download PDF

Info

Publication number
CN114627457A
CN114627457A CN202011460902.5A CN202011460902A CN114627457A CN 114627457 A CN114627457 A CN 114627457A CN 202011460902 A CN202011460902 A CN 202011460902A CN 114627457 A CN114627457 A CN 114627457A
Authority
CN
China
Prior art keywords
image
text
edge
text region
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011460902.5A
Other languages
Chinese (zh)
Inventor
金洪亮
梅俊辉
王志刚
林文辉
王芳
李宏伟
李瑞祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202011460902.5A priority Critical patent/CN114627457A/en
Publication of CN114627457A publication Critical patent/CN114627457A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Character Input (AREA)

Abstract

The application relates to the technical field of computer vision, in particular to a ticket face information identification method and device, which are used for detecting and obtaining images of text regions from obtained images to be identified; respectively inputting any one of the text region images into the trained text recognition model aiming at each text region image, performing character prediction on the feature sequence of the text region image to obtain the probability distribution of the text character to which each frame of feature in the feature sequence belongs, and determining the text character string contained in the text region image according to each probability distribution; and generating the ticket information according to the determined text character strings and a preset ticket information template, so that the text character strings are obtained by identifying the text region images through the text identification model.

Description

Ticket information identification method and device
Technical Field
The application relates to the technical field of computer vision, in particular to a ticket information identification method and device.
Background
When the traditional paper ticket is checked, the manual ticket checking mode is often very inefficient. Along with the quick update iteration of the mobile equipment and the quick development of the mobile internet, when the ticket is checked at a railway station, the automatic ticket checking can be realized by identifying the ticket face information of the railway ticket.
In the prior art, when the face information of a train ticket is recognized, a conventional Optical Character Recognition (OCR) algorithm may be used for implementation. However, when the face information of a train ticket is recognized by OCR, since the number of chinese characters is large and the types of fonts are numerous, the classifier in the OCR algorithm has a weak recognition capability for a single chinese character. In the method in the prior art, the identification accuracy is low.
Disclosure of Invention
The embodiment of the application provides a method and a device for identifying ticket information, so as to improve the accuracy of ticket information identification.
The embodiment of the application provides the following specific technical scheme:
a ticket information identification method comprises the following steps:
detecting and obtaining images of each text area from the acquired images to be identified;
respectively inputting any one text region image into the trained text recognition model aiming at each text region image, performing character prediction on the feature sequence of the text region image to obtain the probability distribution of the text character to which each frame feature in the feature sequence belongs, and determining the text character string contained in the text region image according to each probability distribution;
and generating the ticket information according to the determined text character strings and a preset ticket information template.
Optionally, before detecting and obtaining each text region image from the acquired image to be recognized, the method further includes:
receiving an original image sent by image acquisition equipment;
and intercepting and obtaining an image to be identified by carrying out edge detection on the original image.
Optionally, the capturing and obtaining the image to be identified by performing edge detection on the original image specifically includes:
respectively determining the edge probability value of each pixel point in the original image according to the horizontal axis gradient value and the vertical axis gradient value of any pixel point;
screening out pixel points meeting the preset edge probability value condition from the pixel points, and determining each original edge graph according to the screened pixel points, wherein each edge graph comprises at least one edge line;
randomly combining all edge lines contained in all the original edge images to generate all the edge images;
and respectively reading the aspect ratio of each edge image, determining the edge images meeting the preset aspect ratio condition, and intercepting the original image according to the determined edge images to obtain the image to be identified.
Optionally, the capturing and obtaining the image to be identified from the original image according to the determined edge map specifically includes:
according to the determined edge image, intercepting the original image to obtain an image to be corrected;
and performing transmission transformation on the image to be corrected to obtain an image to be identified.
Optionally, detecting and obtaining images of each text region from the obtained image to be recognized specifically includes:
and based on the trained text line detection model, carrying out text line detection on the image to be recognized by taking the image to be recognized as an input parameter to obtain text region images containing texts.
Optionally, after obtaining the text region images each containing a text, the method further includes:
acquiring attribute information of each text region image, wherein the attribute information at least comprises a center point coordinate, a width value and a height value of the text region image;
and respectively correcting the width value and the height value of the text area image into a preset height value and a preset width value according to the center point coordinate of any text area image aiming at each text area image, and obtaining the corrected text area image.
A face information recognition apparatus comprising:
the first detection module is used for detecting and obtaining each text region image from the obtained image to be identified;
the recognition module is used for inputting any one of the text region images into the trained text recognition model respectively aiming at each text region image, performing character prediction on a feature sequence of the text region image to obtain probability distribution of text characters to which each frame of feature in the feature sequence belongs, and determining text character strings contained in the text region image according to each probability distribution;
and the generating module is used for generating the ticket information according to the determined text character strings and a preset ticket information template.
Optionally, before detecting and obtaining each text region image from the acquired image to be recognized, the method further includes:
the receiving module is used for receiving an original image sent by the image acquisition equipment;
and the second detection module is used for intercepting and obtaining the image to be identified by carrying out edge detection on the original image.
Optionally, the second detection module is specifically configured to:
respectively determining the edge probability value of each pixel point in the original image according to the horizontal axis gradient value and the vertical axis gradient value of any pixel point;
screening out pixel points meeting the preset edge probability value condition from the pixel points, and determining each original edge graph according to the screened pixel points, wherein each edge graph comprises at least one edge line;
randomly combining all edge lines contained in all the original edge images to generate all the edge images;
and respectively reading the aspect ratio of each edge image, determining the edge images meeting the preset aspect ratio condition, and intercepting the original image according to the determined edge images to obtain the image to be identified.
Optionally, when the image to be identified is obtained by capturing from the original image according to the determined edge map, the second detection module is specifically configured to:
according to the determined edge image, intercepting the original image to obtain an image to be corrected;
and performing transmission transformation on the image to be corrected to obtain an image to be identified.
Optionally, the first detection module is specifically configured to:
and based on the trained text line detection model, carrying out text line detection on the image to be recognized by taking the image to be recognized as an input parameter to obtain text region images containing texts.
Optionally, after obtaining the text region images each containing a text, the method further includes:
the acquisition module is used for acquiring attribute information of each text region image, wherein the attribute information at least comprises a center point coordinate, a width value and a height value of the text region image;
and the correction module is used for correcting the width value and the height value of the text region image into a preset height value and a preset width value according to the center point coordinate of any one text region image aiming at each text region image, so as to obtain the corrected text region image.
An electronic device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the ticket information identification method.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned ticket information identification method.
In the embodiment of the application, each text region image is detected and obtained from the obtained image to be recognized, any one of the text region images is input into a trained text recognition model respectively aiming at each text region image, character prediction is performed on a feature sequence of the text region image, probability distribution of a text character to which each frame of feature in the feature sequence belongs is obtained, a text character string contained in the text region image is determined according to each probability distribution, and nominal information is generated according to each determined text character string and a preset nominal information template, so that text characters to which features after each frame of feature in the feature sequence belong are predicted through the text recognition model, text characters which may appear after each text character can be predicted, and text recognition of texts contained in each text region image is completed, compared with the prior art that a classifier is used for identifying a single Chinese character in a text region image, the method can improve the accuracy of ticket information identification, and can accurately and effectively identify each element of the ticket information by combining each text character string and a preset ticket information template.
Drawings
Fig. 1 is a flowchart of a ticket information identification method in an embodiment of the present application;
FIG. 2 is a schematic diagram of a convolution kernel of a sobel operator in the embodiment of the present application;
FIG. 3 is a schematic flow chart of obtaining an image to be recognized in an embodiment of the present application;
FIG. 4 is a schematic diagram of a network structure of a YOLO model in the embodiment of the present application;
FIG. 5 is a diagram illustrating a text line detection result in an embodiment of the present application;
FIG. 6 is a schematic diagram of a network architecture of a CRNN model according to an embodiment of the present application;
FIG. 7 is a schematic diagram of generating ticket information in an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a ticket information recognition system according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a ticket information recognition apparatus in an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
When the traditional paper ticket is checked, the manual ticket checking mode is often very inefficient. Along with the quick update iteration of the mobile equipment and the quick development of the mobile internet, when the ticket is checked at a railway station, the automatic ticket checking can be realized by identifying the ticket face information of the railway ticket. In the prior art, when identifying the ticket face information of a train ticket, the following two ways can be adopted:
the first approach, the Optical Character Recognition (OCR) algorithm. The basic steps of identifying the ticket information through the OCR are that after an image to be identified is preprocessed, a character area of the train ticket is firstly divided, characters in the divided train ticket are single individuals to be identified, then the divided single Chinese characters/characters are identified in a mode of a strong character classifier such as template matching and a support vector machine, and finally the single character identification results are connected to form words or sentences as results. However, when the face information of the train ticket is identified through the OCR, the classifier has weak identification capability on a single Chinese character due to the influence of huge number of Chinese characters, various font types, image deformation, near-word formation, uncommon characters and the like; the image quality has a large impact on the segmentation and positioning of the Chinese characters/characters in the image, and therefore, the classifier in the OCR algorithm has a weak recognition capability for a single Chinese character.
In the second mode, a deep learning method is adopted, the features of the Chinese characters/characters of the line text are extracted by a convolutional neural network, and then the whole text content is identified. However, when the ticket face information is recognized through deep learning, the structural information of the ticket face of the train ticket is not considered while the line text is processed, so that the significance of the obtained text information is not obvious, the information of all elements is mixed together, the acquisition of the information of the ticket face of the train ticket is influenced, and the accuracy of text detection and recognition of the line text is also influenced by the inclination angle of the train ticket.
Therefore, when the ticket face information is identified by the two methods in the prior art, the accuracy is low.
In order to solve the above problems, in the embodiments of the present application, a method for recognizing ticket information is provided, where each text region image is obtained by detecting an obtained image to be recognized, each text region image is input into a trained text recognition model, a feature sequence of the text region image is subjected to character prediction to obtain a probability distribution of a text character to which each frame feature in the feature sequence belongs, a text character string included in the text region image is determined according to each probability distribution, and ticket information is generated according to each determined text character string and a preset ticket information template, so that, through the text recognition model, text recognition can be implemented in combination with prediction of a text, problems of poor recognition efficiency and low accuracy existing in a conventional OCR recognition method can be overcome, and, the ticket face information is generated through each text character string and the preset ticket face template information, and the structure of the ticket face information can be effectively utilized, so that the identification accuracy is improved, and the defect that the structural information of the ticket face of the train ticket is not utilized in a common deep learning-based method is overcome.
Based on the foregoing embodiment, referring to fig. 1, a flowchart of a ticket information identification method in an embodiment of the present application is specifically included:
step 100: and detecting and obtaining each text area image from the acquired image to be identified.
In the embodiment of the application, because the image to be recognized contains the text, the image to be recognized is obtained, and each text region image is obtained by detecting from the obtained image to be recognized, and the text region image contains the text.
In the embodiment of the present application, the step of acquiring the image to be identified is described in detail below, which specifically includes:
s1: and receiving an original image sent by the image acquisition equipment.
In the embodiment of the application, the image acquisition equipment acquires an original image, sends the acquired original image to the server, and then the server receives the original image sent by the image acquisition equipment.
The image acquisition equipment comprises but is not limited to a mobile end camera, a PC end camera and a high-speed shooting instrument.
Of course, the method may also be implemented by album uploading, gallery uploading, and the like, which is not limited in the embodiment of the present application.
It should be noted that the original image is a high-definition digital image. Moreover, the color channel of the original image is a color or grayscale image, and when the original image is a color image, the color image needs to be converted into a grayscale image, and then the grayscale image needs to be processed correspondingly.
For example, the original image may be an image containing a train ticket, for example. In the original image containing the train ticket, the literal and digital information of the train ticket needs to be clearly visible. And the train ticket area occupies the main area of the acquired original image, and the direction of the train ticket area is in the positive direction.
The original image may also be, for example, an image of a concert ticket, an image of an electronic invoice, an image of an identity card, and the like, and the method in the embodiment of the present application may be used in both a ticket checking scene and an information extraction scene, which is not limited herein.
S2: and intercepting to obtain an image to be identified by carrying out edge detection on the original image.
In the embodiment of the application, the original image includes other objects besides the object to be identified, and the other objects may affect the accuracy of identifying the ticket information, so in order to improve the accuracy of identification, after the original image sent by the image acquisition device is received, edge detection needs to be performed on the original image, an edge in the original image is obtained, and the image to be identified, which only includes the object to be identified, is intercepted from the original image, so that the image to be identified is obtained.
For example, assuming that the image capturing device is a camera, at this time, a train ticket needs to be checked, and therefore, a train ticket needs to be captured, so as to capture and obtain an original image containing a train ticket area. Since the original image may include not only the area image of the train ticket but also the area image of another object, for example, an image of a table, an image of a pen on a table, an image of paper, or the like, it is necessary to perform edge detection on the original image and to cut out only the image including the area of the train ticket from the original image in order to improve the recognition accuracy.
The following describes in detail the steps of performing edge detection on an original image and capturing an image to be identified in the embodiment of the present application, and specifically includes:
a1: and respectively determining the edge probability value of each pixel point in the original image according to the horizontal axis gradient value and the vertical axis gradient value of any pixel point.
In the embodiment of the application, the original image is composed of a plurality of pixel points, so that the horizontal axis gradient value and the vertical axis gradient value of any pixel point are calculated respectively for each pixel point in the original image, and then the edge probability value of the pixel point is determined according to the horizontal axis gradient value and the vertical axis gradient value of the pixel point.
For example, when the original image is an image including a train ticket, an edge detection algorithm is first performed on the original image, and an edge detection operator uses sobel operators in an x-axis and a y-axis, as shown in fig. 2, which is a schematic diagram of a convolution kernel of the sobel operator in the embodiment of the present application, and then the operator is performed on any one pixel point on the original image through a preset edge probability value formula, so that each edge map included in the original image can be obtained.
The edge probability value of each pixel point may be represented as:
Figure BDA0002831667580000081
wherein G isxIs a horizontal axis gradient value, GyIs the vertical axis gradient value.
Therefore, the edge probability value of each pixel point can be obtained, and the edge probability value represents the probability whether the pixel point is an edge pixel point.
Further, since the input original image may be a color image, the color image is converted into a gray image, and then the gray image is subjected to edge detection.
A2: and screening out pixel points meeting the preset edge probability value condition from all the pixel points, and determining all the original edge maps according to the screened pixel points.
Wherein, each edge graph comprises at least one edge line.
In the embodiment of the application, whether the edge probability value of each pixel meets the preset edge probability value condition or not is judged respectively, then, the pixels meeting the preset edge probability value condition are screened out from each pixel, and each original edge graph is determined according to the screened pixels.
For example, an edge probability value threshold may be set, and then, pixels larger than the edge probability value threshold are screened out from the pixels, so that each original edge map is determined according to the pixels larger than the edge probability value threshold.
It should be noted that each original edge map is composed of a plurality of n straight line segments with different lengths and directions.
A3: and randomly combining all edge lines contained in all the original edge maps to generate all the edge maps.
In the embodiment of the application, each edge line included in each original edge map is traversed, and then, a preset number of edge lines are randomly selected from all the edge lines included in each original edge map to be randomly combined, so that each combined edge map is generated.
For example, suppose a predetermined number of edge lines are selected as 4 edge lines, that is, four edge lines are arbitrarily selected to form an edge map. Assuming that 2 original edge maps are identified in total from the original image, all edge lines included in the two original edge maps are counted. Assume that there are 6 edge lines on the first original edge map and 4 edge lines on the second original edge map. According to statistics, the two original edge maps have 10 edge lines in total, and then 4 of the 10 edge lines are arbitrarily selected from the 10 edge lines to form an edge map, so that the 10 edge lines can form the edge map in total
Figure BDA0002831667580000091
An edge map.
A4: and respectively reading the aspect ratio of each edge image, determining the edge images meeting the preset aspect ratio condition, and intercepting the original image according to the determined edge images to obtain the image to be identified.
In the embodiment of the present application, first, the aspect ratio of each edge map is read.
The aspect ratio represents a proportional relationship between the length value and the width value of the edge map, and may be set to 3:2, for example, which is not limited in the embodiment of the present application.
Then, respectively aiming at each edge map, judging whether any edge map meets a preset aspect ratio condition, and further determining the edge map meeting the preset aspect ratio condition from each edge map.
The aspect ratio condition is preset in the server.
And finally, according to the determined edge image, intercepting the original image to obtain an image to be identified.
For example, assuming that the original image is an image containing a train ticket, when the train ticket image is selected from the edge maps, all the edge maps are traversed by using the characteristic that after the train ticket is shot at different angles, the formed edge is a parallelogram or a slightly distorted trapezoid, so as to obtain the edge map meeting the preset aspect ratio condition, thereby obtaining the maximum possible train ticket area, and according to the determined edge map, the train ticket area is obtained by intercepting from the original image, and is the image to be identified.
It should be noted that when a plurality of edge maps meet the preset aspect ratio condition, a standard image is obtained, an edge map with the smallest error with the standard image is screened from each edge map, and the screened edge map is used as the finally determined edge map. And the size of the original image is the same as that of the edge image, the position of the ticket face obtained from the edge image is a quadrangle, and the position of the ticket face corresponding to the original image is the position of the ticket face in the original image, so that the original image is intercepted according to the determined edge image, and the intercepted image is the image to be identified.
Further, since the image capturing apparatus may have a certain inclination angle when capturing the original image, the determined image to be recognized may not be rectangular. Then, in order to ensure the accuracy of text recognition included in the image to be recognized, after determining the edge map satisfying the preset aspect ratio condition, the image to be corrected may be obtained by intercepting according to the determined edge map, the image to be corrected is converted into a standard image, and the standard image is used as the image to be recognized, thereby ensuring that the image to be recognized input into the text recognition model is a standard rectangular image. The following describes in detail the step of capturing and obtaining an image to be identified from an original image according to the determined edge map, and specifically includes:
a1: and according to the determined edge image, intercepting the original image to obtain an image to be corrected.
In the embodiment of the application, because the edge map represents the edge of the face, the original image is cut according to the edge map, and the image to be corrected is cut from the original image.
A2: and performing transmission transformation on the image to be corrected to obtain the image to be identified.
In the embodiment of the present application, transmission transformation is performed on the captured image to be corrected, the image to be corrected is normalized to a standard image, so as to obtain a corrected image to be corrected, and the corrected image to be corrected is used as an image to be recognized, which is shown in fig. 3 and is a schematic flow diagram for obtaining the image to be recognized in the embodiment of the present application.
When the image to be corrected is normalized to the standard image, the aspect ratio and the size of the standard image are set in advance.
Therefore, after obtaining the image to be recognized, text line detection may be performed on the image to be recognized to obtain text region images each including a text, and the following describes in detail the step of obtaining each text region image from the obtained image to be recognized in this embodiment, and specifically includes:
based on the trained text line detection model, the image to be recognized is taken as an input parameter, text detection is carried out on the image to be recognized, and text region images containing text lines are obtained.
In the embodiment of the application, an image to be recognized is input into a trained text line detection model, feature extraction is performed on the image to be recognized, image features of the image to be recognized are obtained, text line detection is performed on the image to be recognized according to the image features, if the image to be recognized contains text lines, the text lines contained in the image to be recognized are marked out through a rectangular frame, the text lines contained in the image to be recognized are intercepted, and a text region image containing the text lines is obtained.
For example, assuming that the image to be recognized is a train ticket, text line detection is performed on the train ticket to obtain a text region image including text lines, which is shown in fig. 5 and is a schematic diagram of a text line detection result in the embodiment of the present application. "a station → B station" is a text region image, "12, 25, 12/2018, 14:20, 10, car number 12B" is a text region image, "18557310691225H 017579A station sold" is another text region image.
The text line model may be, for example, a deep neural network model of YOLO (young Only Look once), which is shown in fig. 4 and is a schematic network structure diagram of the YOLO model in the embodiment of the present application. The parameters of the YOLO model are obtained by training a text and non-text picture training set through a back propagation algorithm. The size of the image to be recognized input into the YOLO model may be, for example, 416 × 416 pixels. The YOLO model includes 53 convolutional layers DBL, 23 residual layers, and one average pooling layer. Each convolutional layer DBL contains a standard 2-dimensional convolutional layer, a Batch Normalization (BN) layer, and a regularized excitation layer (leakage Rectified Linear Units, leakage relu). The output of the YOLO model is text region images each containing text.
Further, in this embodiment of the present application, after obtaining text region images each including a text, the text region images may also be corrected to ensure that the text region images input into the text recognition model are images of the same size, which specifically includes:
s1: and acquiring attribute information of each text area image.
The attribute information at least comprises a center point coordinate, a width value and a height value of the text region image.
In the embodiment of the application, the text line detection model can also output attribute information of the text region images, so that the attribute information of each text region image can be obtained, wherein the attribute information at least comprises a center point coordinate, a width value and a height value of each text region image.
For example, the text line detection model is a YOLO model, and the output of the YOLO model is attribute information x, y, w, h of a rectangular box, that is, a center point coordinate, a width value, and a height value of the rectangular box.
Further, the text line detection model can also output a bounding box confidence and two category labels, text and non-text.
The frame confidence represents the probability of whether the text region image is a text frame, and the category label represents whether the text region image is a text or a non-text.
Therefore, when the text region images are subsequently recognized, only the text region images whose category labels are texts are recognized, and the text region images whose category labels are non-texts are discarded.
S2: and respectively correcting the width value and the height value of the text region image into a preset height value and a preset width value according to the center point coordinate of any one text region image aiming at each text region image, and obtaining the corrected text region image.
In the embodiment of the application, for each text region image, according to a center point coordinate of any one text region image, the width value and the height value of the text region image are corrected to be preset height value and width value, and the text region image is subjected to normalization processing to obtain the corrected text region image.
For example, the text region image is normalized so that the height of the text region image is 32 pixels and the width is adjusted in the same ratio.
Step 110: respectively inputting any one text region image into the trained text recognition model aiming at each text region image, performing character prediction on the feature sequence of the text region image to obtain the probability distribution of the text character to which each frame feature in the feature sequence belongs, and determining the text character string contained in the text region image according to each probability distribution.
In the embodiment of the application, for each text region image, any one text region image is input into a trained text recognition model, feature extraction is performed on the text region image through a convolutional network layer to obtain a feature sequence of the text region image, character prediction is performed on the feature sequence through a cyclic network layer to obtain probability distribution of text characters to which each frame feature in the feature sequence belongs, and finally, through a sequence recognition layer, each probability distribution is recognized to determine text character strings contained in the text region image.
For example, feature extraction is performed on a text region image, then, according to a first frame feature obtained by extraction, a first character is determined to be a, then, a second frame feature is predicted, the probability of obtaining a text character "station" to which the second frame feature belongs is 90%, and the probability of obtaining a text character "send" to which the second frame feature belongs is 10%, and then the probability of the text character "station" to which the second frame feature belongs is the highest, so that the text character to which the second frame feature belongs is determined to be "station", and thus, the character included in the text region image is obtained to be "a station".
In this way, when a text character string contained in the text region image is recognized, feature extraction is performed on the text region image to obtain a feature sequence of the text region image, and a text character to which each frame feature in the feature sequence belongs is predicted.
The text recognition model may be, for example, a Convolutional Recurrent Neural Network (CRNN) model, and refer to fig. 6, which is a schematic diagram of a Network architecture of the CRNN model in the embodiment of the present application, where the Network architecture includes a Convolutional Network layer, a cyclic Network layer, and a sequence recognition layer. The convolutional network layers comprise 7 standard convolutional layers, 4 maximum pooling layers and 2 Batch Normalization (BN).
The convolution network layer is used for extracting the characteristics of the text region image to obtain a characteristic sequence of the text region image.
And the circulating network layer is used for carrying out character prediction on the characteristic Sequence of the image in the text area to obtain the probability distribution of the text characters of each frame of characteristic in the characteristic Sequence.
And the sequence recognition layer integrates the results of the characteristic sequences predicted by the previous layer and outputs the final text character string, and the sequence recognition layer is used for determining the text character string contained in the text region image according to each probability distribution.
Step 120: and generating the ticket information according to the determined text character strings and a preset ticket information template.
In the embodiment of the application, because the text character string output by the text recognition model is an unstructured text character string, the ticket information needs to be generated according to each determined text character string and a preset ticket information template.
For example, when the image to be recognized is a train ticket, analyzing each element in the train ticket by combining a text character string output by the text recognition model and a preset ticket face information template to obtain the final ticket face information of the train ticket. Referring to fig. 7, which is a schematic diagram of generating the ticket face information in the embodiment of the present application, the text contents of the text region image obtained by text recognition are "a station G474B station", "2018, 12, 25, 14:20, 10 car 12B", "360.5 yuan", "1002001234 × 3333", and the obtained text contents are subjected to structured analysis and filled into a preset ticket face information template, so that several elements of the train ticket face obtained by the structured analysis are respectively: "origination station: station a "," train number: g114 "," terminal: station B "," departure date: 12/25/2018 "," driving time: 14:20 "," seat number: 10 car No. 12B "," fare: 360.5 yuan, identification number: 1002001234****3333".
Therefore, the structural information of the train ticket face is considered, the significance of the obtained text character string is obvious, and the defect that the structural information of the train ticket face is not utilized in a common deep learning-based method is overcome.
In the embodiment of the application, each text region image is obtained by detecting the acquired image to be recognized, and any one text region image is input to the trained text recognition model aiming at each text region image, character prediction is carried out on the characteristic sequence of the text region image to obtain the probability distribution of the text character to which each frame of characteristic in the characteristic sequence belongs, determining text character strings contained in the text region image according to the probability distributions, generating ticket information according to the determined text character strings and a preset ticket information template, in this way, the characteristic features of the convolutional neural network are combined, each element in the original image can be extracted without errors, and after the original image is acquired, the original image is processed, so that the inclined original image can be well processed, and the identification accuracy is improved.
Based on the above embodiment, taking the original image as an image including a train ticket as an example, referring to fig. 8, a schematic structural diagram of a ticket face information identification system in the embodiment of the present application is shown, which specifically includes:
1. and the image acquisition module is used for acquiring and storing the original image of the train ticket.
The original image of the train ticket can be acquired through a camera, a mobile phone, a high-speed shooting instrument and the like, and the method is not limited in the embodiment of the application.
2. The image processing module is used for preprocessing the image, detecting the edge area of the railway ticket by using an edge detection algorithm Sobel operator for the acquired original image of the railway ticket, and correcting the inclined railway ticket image according to the result of the edge detection algorithm, namely, performing direction correction according to the main direction of the railway ticket edge, thereby effectively dividing the accurate image to be identified only containing the railway ticket.
And the output result of the image processing module is the corrected image to be identified which only contains the train ticket.
3. And the character recognition module is used for detecting and recognizing the text of the cut image to be recognized by using the convolutional neural network to obtain the text character strings of the image in each text area.
The input of the character recognition module is the image to be recognized which only contains the train ticket after being corrected by the image processing module. The character recognition module comprises a text line detection module and a character recognition module.
The text line detection module is used for detecting and positioning text lines in the text area of each element by using a YOLO model to obtain an image of each text area, and in the YOLO model, the preselected frame is an anchor frame suitable for the line text detection ratio.
The character recognition module is used for performing text recognition on each text region image to obtain each text character string.
4. And the output module is used for analyzing and obtaining various contents of the ticket surface of the train ticket and storing the contents in combination with each text character string output by the character recognition module and each preset ticket surface information template.
The input of the output module is the text character string which is output by the character recognition module and is not structurally analyzed.
In the embodiment of the application, each element in the train ticket image can be automatically, efficiently and accurately identified, the inclination and rotation of the train ticket image are insensitive, and the whole process of acquisition, positioning and correction of the train ticket image, identification and storage of the train ticket content is comprehensively realized.
Based on the same inventive concept, the embodiment of the present application further provides a ticket information identification apparatus, which may be, for example, the server in the foregoing embodiment, and the ticket information identification apparatus may be a hardware structure, a software module, or a hardware structure plus a software module. Based on the above embodiment, referring to fig. 9, a schematic structural diagram of a ticket information identification device in an embodiment of the present application specifically includes:
a first detection module 900, configured to detect and obtain each text region image from the acquired image to be identified;
the recognition module 910 is configured to input any one of the text region images to a trained text recognition model for each of the text region images, perform character prediction on a feature sequence of the text region image, obtain probability distribution of text characters to which each frame of feature in the feature sequence belongs, and determine a text character string included in the text region image according to each probability distribution;
and a generating module 920, configured to generate the ticket information according to the determined text character strings and a preset ticket information template.
Optionally, before detecting and obtaining each text region image from the acquired image to be recognized, the method further includes:
a receiving module 930, configured to receive an original image sent by an image capturing device;
a second detecting module 940, configured to perform edge detection on the original image, and intercept the original image to obtain an image to be identified.
Optionally, the second detecting module 940 is specifically configured to:
respectively determining the edge probability value of each pixel point in the original image according to the horizontal axis gradient value and the vertical axis gradient value of any pixel point;
screening out pixel points meeting the preset edge probability value condition from the pixel points, and determining each original edge graph according to the screened pixel points, wherein each edge graph comprises at least one edge line;
randomly combining all edge lines contained in all the original edge images to generate all the edge images;
and respectively reading the aspect ratio of each edge image, determining the edge images meeting the preset aspect ratio condition, and intercepting the original image according to the determined edge images to obtain the image to be identified.
Optionally, when the image to be identified is obtained by capturing from the original image according to the determined edge map, the second detection module 940 is specifically configured to:
according to the determined edge image, intercepting the original image to obtain an image to be corrected;
and performing transmission transformation on the image to be corrected to obtain an image to be identified.
Optionally, the first detecting module 900 is specifically configured to:
and based on the trained text line detection model, carrying out text line detection on the image to be recognized by taking the image to be recognized as an input parameter to obtain text region images containing texts.
Optionally, after obtaining the text region images each containing a text, the method further includes:
an obtaining module 950, configured to obtain attribute information of each text region image, where the attribute information at least includes a center point coordinate, a width value, and a height value of the text region image;
the correcting module 960 is configured to correct, for each text region image, the width value and the height value of the text region image to be preset height values and width values according to the coordinates of the center point of any one text region image, so as to obtain a corrected text region image.
Based on the above embodiments, fig. 10 is a schematic structural diagram of an electronic device in an embodiment of the present application.
An embodiment of the present application provides an electronic device, which may include a processor 1010 (CPU), a memory 1020, an input device 1030, an output device 1040, and the like, wherein the input device 1030 may include a keyboard, a mouse, a touch screen, and the like, and the output device 1040 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.
Memory 1020 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 1010 with program instructions and data stored in memory 1020. In the embodiment of the present application, the memory 1020 may be used to store a program of any ticket information identification method in the embodiment of the present application.
The processor 1010 is configured to execute any of the ticket information recognition methods according to the embodiments of the present application by calling the program instructions stored in the memory 1020.
Based on the above embodiments, in the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the ticket information identification method in any of the above method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A ticket information identification method is characterized by comprising the following steps:
detecting and obtaining images of each text area from the acquired images to be identified;
respectively inputting any one text region image into the trained text recognition model aiming at each text region image, performing character prediction on the feature sequence of the text region image to obtain the probability distribution of the text character to which each frame feature in the feature sequence belongs, and determining the text character string contained in the text region image according to each probability distribution;
and generating the ticket information according to the determined text character strings and a preset ticket information template.
2. The method of claim 1, wherein before detecting and obtaining each text region image from the acquired image to be recognized, the method further comprises:
receiving an original image sent by image acquisition equipment;
and intercepting and obtaining an image to be identified by carrying out edge detection on the original image.
3. The method of claim 2, wherein the capturing the image to be recognized by performing edge detection on the original image specifically comprises:
respectively determining the edge probability value of each pixel point in the original image according to the horizontal axis gradient value and the vertical axis gradient value of any pixel point;
screening out pixel points meeting the preset edge probability value condition from the pixel points, and determining each original edge graph according to the screened pixel points, wherein each edge graph comprises at least one edge line;
randomly combining all edge lines contained in all the original edge images to generate all the edge images;
and respectively reading the aspect ratio of each edge image, determining the edge images meeting the preset aspect ratio condition, and intercepting the original image according to the determined edge images to obtain the image to be identified.
4. The method according to claim 3, wherein the step of obtaining the image to be recognized by intercepting the original image according to the determined edge map specifically comprises the steps of:
according to the determined edge image, intercepting the original image to obtain an image to be corrected;
and performing transmission transformation on the image to be corrected to obtain an image to be identified.
5. The method according to claim 1, wherein detecting and obtaining each text region image from the obtained image to be recognized specifically comprises:
and based on the trained text line detection model, carrying out text line detection on the image to be recognized by taking the image to be recognized as an input parameter to obtain text region images containing texts.
6. The method of claim 5, after obtaining text region images each containing text, further comprising:
acquiring attribute information of each text region image, wherein the attribute information at least comprises a center point coordinate, a width value and a height value of the text region image;
and respectively correcting the width value and the height value of the text area image into a preset height value and a preset width value according to the center point coordinate of any text area image aiming at each text area image, and obtaining the corrected text area image.
7. A face information recognition apparatus, comprising:
the first detection module is used for detecting and obtaining each text region image from the obtained image to be identified;
the recognition module is used for inputting any one of the text region images into the trained text recognition model respectively aiming at each text region image, performing character prediction on a feature sequence of the text region image to obtain probability distribution of text characters to which each frame of feature in the feature sequence belongs, and determining text character strings contained in the text region image according to each probability distribution;
and the generating module is used for generating the ticket information according to the determined text character strings and a preset ticket information template.
8. The apparatus according to claim 7, wherein before detecting and obtaining each text region image from the acquired image to be recognized, further comprising:
the receiving module is used for receiving an original image sent by the image acquisition equipment;
and the second detection module is used for intercepting and obtaining the image to be identified by carrying out edge detection on the original image.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any of claims 1-6 are implemented when the processor executes the program.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 6.
CN202011460902.5A 2020-12-11 2020-12-11 Ticket information identification method and device Pending CN114627457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011460902.5A CN114627457A (en) 2020-12-11 2020-12-11 Ticket information identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011460902.5A CN114627457A (en) 2020-12-11 2020-12-11 Ticket information identification method and device

Publications (1)

Publication Number Publication Date
CN114627457A true CN114627457A (en) 2022-06-14

Family

ID=81894693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011460902.5A Pending CN114627457A (en) 2020-12-11 2020-12-11 Ticket information identification method and device

Country Status (1)

Country Link
CN (1) CN114627457A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132460A (en) * 2023-09-12 2023-11-28 上海世禹精密设备股份有限公司 Method and device for generating visual inspection standard chart, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132460A (en) * 2023-09-12 2023-11-28 上海世禹精密设备股份有限公司 Method and device for generating visual inspection standard chart, electronic equipment and storage medium
CN117132460B (en) * 2023-09-12 2024-02-27 上海世禹精密设备股份有限公司 Method and device for generating visual inspection standard chart, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110766014B (en) Bill information positioning method, system and computer readable storage medium
CN110232713B (en) Image target positioning correction method and related equipment
US8059868B2 (en) License plate recognition apparatus, license plate recognition method, and computer-readable storage medium
CN103034848B (en) A kind of recognition methods of form types
US20170351913A1 (en) Document Field Detection And Parsing
CN110490190B (en) Structured image character recognition method and system
CN113780087B (en) Postal package text detection method and equipment based on deep learning
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
CN111626249B (en) Method and device for identifying geometric figure in topic image and computer storage medium
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
CN110619333A (en) Text line segmentation method, text line segmentation device and electronic equipment
CN112949455B (en) Value-added tax invoice recognition system and method
US9575935B2 (en) Document file generating device and document file generation method
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN113903024A (en) Handwritten bill numerical value information identification method, system, medium and device
CN112395995A (en) Method and system for automatically filling and checking bill according to mobile financial bill
US10115036B2 (en) Determining the direction of rows of text
CN109508716B (en) Image character positioning method and device
CN108090728B (en) Express information input method and system based on intelligent terminal
RU2597163C2 (en) Comparing documents using reliable source
CN114627457A (en) Ticket information identification method and device
CN112036232A (en) Image table structure identification method, system, terminal and storage medium
CN116994269A (en) Seal similarity comparison method and seal similarity comparison system in image document
CN111008635A (en) OCR-based multi-bill automatic identification method and system
CN112861861B (en) Method and device for recognizing nixie tube text and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination