CN112560599A - Text recognition method and device, computer equipment and storage medium - Google Patents

Text recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112560599A
CN112560599A CN202011390409.0A CN202011390409A CN112560599A CN 112560599 A CN112560599 A CN 112560599A CN 202011390409 A CN202011390409 A CN 202011390409A CN 112560599 A CN112560599 A CN 112560599A
Authority
CN
China
Prior art keywords
text
arrangement direction
initial
training
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011390409.0A
Other languages
Chinese (zh)
Inventor
陈�光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202011390409.0A priority Critical patent/CN112560599A/en
Publication of CN112560599A publication Critical patent/CN112560599A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application relates to a text recognition method, a text recognition device, a computer device and a storage medium. The method comprises the following steps: acquiring a text image to be recognized; the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions. By adopting the method, the recognition efficiency of the text content can be improved.

Description

Text recognition method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a text recognition method, apparatus, computer device, and storage medium.
Background
With the development of computer technology, the hardware performance of computers is higher and higher, which makes the field of deep learning computer vision develop rapidly with the support of hardware devices in recent years. The deep learning technology has great technical breakthrough and rich application scenes in the field of text recognition.
Currently, the main text recognition has been replaced by a deep learning method from the traditional processes of character segmentation, feature extraction and character classification, and researchers can effectively recognize texts through a CNN network and an RNN network.
However, the current text recognition network cannot well judge the arrangement direction of the detected text, so that the recognition efficiency of the text content information is low.
Disclosure of Invention
In view of the above, it is necessary to provide a text recognition method, an apparatus, a computer device and a storage medium capable of improving the efficiency of image text content recognition.
A method of text recognition, the method comprising:
acquiring a text image to be recognized;
the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
In one embodiment, recognizing a text image to be recognized according to a model parameter pre-trained in a machine learning model to obtain text recognition content includes:
recognizing a text image to be recognized according to model parameters pre-trained in a machine learning model to obtain a text arrangement direction recognition result and a plurality of text content recognition results;
and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
In one embodiment, the machine learning model includes a text arrangement direction recognition model and a text content recognition model; the training mode of the text arrangement direction recognition model and the text internal identity recognition model comprises the following steps:
acquiring a plurality of training text images;
respectively inputting the training text images into an initial text arrangement direction recognition model, and recognizing the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model to obtain the initial recognition result of the text arrangement direction;
determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the initial content recognition parameter corresponding to the determined initial text content recognition model to obtain an initial recognition result of the text content;
determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label;
and adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to the target loss function until the training end conditions are met, acquiring the current arrangement direction identification parameters and the current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
In one embodiment, inputting the training text images into the initial text alignment direction recognition model respectively comprises:
extracting image features of the training text image to obtain training text image features;
and respectively inputting the training text image characteristics into the initial text arrangement direction recognition model.
In one embodiment, the initial text content recognition model comprises a forward arrangement direction text content recognition model and a reverse arrangement direction text content recognition model; determining an initial text content recognition model corresponding to a training text image according to an initial recognition result of the text arrangement direction, wherein the initial text content recognition model comprises the following steps:
when the initial recognition result of the text arrangement direction corresponds to the positive arrangement direction, determining that the text content recognition model in the positive arrangement direction is the initial text content recognition model corresponding to the training text image;
and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
In one embodiment, determining the target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real tag comprises:
when the initial text content recognition model corresponds to the text content recognition model in the positive arrangement direction, acquiring a text content recognition result in the positive arrangement direction obtained by recognizing the training text image by the text content recognition model in the positive arrangement direction, and determining a target loss function according to the text content recognition result in the positive arrangement direction, the text initial recognition result and the real label;
when the initial text content recognition model corresponds to the inverted arrangement direction text content recognition model, acquiring an inverted arrangement direction text content recognition result obtained by recognizing the training text image by the inverted arrangement direction text content recognition model, and determining a target loss function according to the inverted arrangement direction text content recognition result, the text initial recognition result and the real label.
In one embodiment, acquiring a plurality of training text images comprises:
acquiring a text image in a positive arrangement direction, wherein the text content is distributed in the positive arrangement direction;
converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction;
carrying out interference expansion processing on the text images in the positive arrangement direction and the text images in the reverse arrangement direction to obtain interference text images;
and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
A text recognition apparatus, the apparatus comprising:
the acquisition module is used for acquiring a text image to be recognized;
the recognition module is used for inputting the text images to be recognized into a pre-trained machine learning model, recognizing the text images to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of the embodiments described above when the computer program is executed by the processor.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.
The text recognition method, the text recognition device, the computer equipment and the storage medium acquire a text image to be recognized; the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions. Because the machine learning model is trained to have the capability of recognizing the text arrangement direction and the text content at the same time in advance, the text content and the text arrangement direction can be recognized at the same time through the scheme, so that the recognition capability of the text content is stronger, and the recognition efficiency of the text content is improved.
Drawings
FIG. 1 is a diagram of an application environment of a text recognition method in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for text recognition, according to one embodiment;
FIG. 3 is an image of a text to be recognized as provided in one embodiment;
FIG. 4 is a diagram of a text image to be recognized provided in another embodiment;
FIG. 5 is a block diagram of a middle machine learning model provided in one embodiment;
FIG. 6 is a flow diagram of a method for training a machine learning model provided in one embodiment;
FIG. 7 is a diagram of a model for extracting image features, provided in one embodiment;
FIG. 8 is a diagram of a network model architecture provided in one embodiment;
FIG. 9 is a block diagram showing the structure of a text recognition apparatus according to an embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The text recognition method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 acquires a text image to be identified uploaded by the terminal 102; inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, and outputting the text recognition contents in the terminal 102, wherein the machine learning model is obtained in the server 104 by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a text recognition method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 202, acquiring a text image to be recognized.
The text image to be recognized includes text content, wherein the distribution form of the text content may include a plurality of distribution arrangement directions. Specifically, under the same reference frame, the text contents in the text image to be recognized may be uniformly distributed in the positive arrangement direction, may be uniformly distributed in the reverse arrangement direction, or may be distributed in the arrangement direction at an oblique angle in other embodiments.
In one embodiment, the text arrangement directions of the text contents in the same text image to be recognized are the same, referring to fig. 3, fig. 3 is a text image to be recognized provided in one embodiment, in fig. 3, the text contents in the text image to be recognized are distributed in the same arrangement direction, and the arrangement direction is specifically a positive arrangement distribution direction. Referring to fig. 4, fig. 4 is a text image to be recognized provided in another embodiment, in fig. 4, text contents in the text image to be recognized are distributed in the same arrangement direction, and the arrangement direction of the text distribution is specifically the reverse arrangement direction. It should be noted that the forward arrangement distribution direction and the reverse arrangement distribution direction are relative concepts, and the definition of the forward arrangement distribution direction and the reverse arrangement distribution direction may be different based on different reference coordinate systems, but in different embodiments, the forward arrangement distribution direction and the reverse arrangement distribution direction are both expressed as two different arrangement distribution directions, and further, the forward arrangement distribution direction and the reverse arrangement distribution direction may also be overlapped by rotating a certain angle, for example, rotating 180 degrees.
Step 204, inputting the text image to be recognized into a pre-trained machine learning model, and recognizing the text image to be recognized according to the model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
The machine learning model is obtained by training according to a training set formed by a plurality of text images to be recognized, so that the trained machine learning model has the capacity of recognizing text contents in the text images to be recognized. The text content of each text image to be recognized included in the training set is distributed in the same arrangement direction, and the text arrangement directions of the text content to be recognized included in different text images to be recognized can be distributed in different arrangement directions. The arrangement direction includes an extending direction of the text content, and specifically may be that the text content is arranged in the same horizontal direction to obtain a line of text content, and the arrangement direction further includes whether the text content is distributed forward or reversely. Referring to fig. 3 and 4, the text contents included in the corresponding text image to be recognized in fig. 3 are all distributed in the forward arrangement direction, and the text contents included in the corresponding text image to be recognized in fig. 4 are all distributed in the reverse arrangement direction, and the arrangement distribution directions of the text contents included in the text images to be recognized respectively corresponding in fig. 3 and 4 are different. And the machine learning model obtained by training the images to be recognized corresponding to different text arrangement distribution directions has stronger recognition capability, namely, the image content of the images to be recognized comprising various text arrangement directions can be recognized, and the recognition capability and the recognition efficiency of the machine learning model are improved.
The text recognition method acquires a text image to be recognized; the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions. Because the machine learning model is trained to have the capability of recognizing the text arrangement direction and the text content at the same time in advance, the text content and the text arrangement direction can be recognized at the same time through the scheme, so that the recognition capability of the text content is stronger, and the recognition efficiency of the text content is improved.
In one embodiment, recognizing a text image to be recognized according to a model parameter pre-trained in a machine learning model to obtain text recognition content includes: recognizing a text image to be recognized according to model parameters pre-trained in a machine learning model to obtain a text arrangement direction recognition result and a plurality of text content recognition results; and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
Specifically, the trained machine learning model has the capability of recognizing the text arrangement direction of the text in the input image, and also has the capability of recognizing the text content in the image. That is to say, the trained machine learning model does not limit the arrangement distribution direction of the text content included in the input image to be recognized, and can be the forward arrangement distribution direction or the reverse arrangement distribution direction, and the text content information in the images in different distribution directions can be accurately recognized through the model provided in the application.
In one embodiment, the trained machine learning model has a plurality of network branches, where one network branch is used to identify a text arrangement direction of text content included in an input text image to be identified to obtain a text arrangement direction identification result, where the text arrangement direction identification result may specifically include a positive arrangement direction identification result and a negative arrangement direction identification result. The other network branch is used for identifying the text content in the input text image to be identified to obtain a text content identification result, and the text arrangement direction of the text content in the image to be identified comprises a text arrangement direction and a reverse text arrangement direction, so that the obtained text content identification result comprises a positive text content identification result and a reverse text content identification result. In one embodiment, the network of the text content recognition result may be further divided into a forward arrangement direction text content recognition network and a reverse arrangement direction text content recognition network, which are respectively used for performing content recognition on the forward arrangement direction text content and the reverse arrangement direction text content.
As shown in fig. 5, fig. 5 is a schematic structural diagram of a middle machine learning model provided in an embodiment. In fig. 5, the trained machine learning model includes a text forward-backward classification network, a text recognition RNN _1 network, and a text recognition RNN _2 network. The text forward-backward classification network is used for identifying the text arrangement direction of the input image to be identified to obtain a text arrangement direction identification result. The text recognition RNN _1 network and the text recognition RNN _2 network are respectively used for recognizing text contents in the images to be recognized in different text arrangement directions. In the embodiment, the content of the images to be recognized containing different text arrangement directions is recognized by using different text content recognition networks, so that the used text content recognition network is more accurate, and the accuracy of the obtained text recognition content is higher.
In one embodiment, the server identifies the text image to be identified according to the model parameters pre-trained in the machine learning model to obtain a text arrangement direction identification result, wherein the text arrangement direction identification result comprises a forward arrangement direction text identification result and a reverse arrangement direction text identification result. And identifying the text image to be identified according to the model parameters pre-trained in the machine learning model to obtain a plurality of text content identification results, wherein the plurality of text content identification results comprise a text content identification result in the forward arrangement direction and a text content identification result in the reverse arrangement direction. And the server extracts a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results according to the text arrangement direction identification result obtained by identification, and takes the extracted text content identification result as the text identification content of the text image to be identified. If the machine learning model respectively carries out text arrangement directivity identification and text content identification on the text image to be identified, when the text arrangement direction identification result corresponding to the text image to be identified is output as a positive arrangement direction identification result, extracting the positive arrangement direction text content identification result as a text content identification result corresponding to the text image to be identified.
In the embodiment, the trained machine learning model has the text arrangement direction recognition function and the text content recognition function at the same time, and different functions are matched with each other to realize recognition of the text content to be recognized, so that the efficiency and the recognition accuracy of the text content recognition are improved.
In one embodiment, as shown in fig. 6, fig. 6 is a flowchart illustrating a training method of a machine learning model provided in one embodiment. Specifically, the machine learning model includes a text arrangement direction recognition model and a text content recognition model; the training mode of the text arrangement direction recognition model and the text internal identity recognition model comprises the following steps:
step 602, a plurality of training text images are obtained.
Specifically, the server acquires a plurality of training text images, and text contents in each training text image are distributed in the same arrangement direction and text contents in different training text images are distributed in different arrangement directions.
In one embodiment, acquiring a plurality of training text images comprises: acquiring a text image in a positive arrangement direction, wherein the text content is distributed in the positive arrangement direction; converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction; carrying out interference expansion processing on the text images in the positive arrangement direction and the text images in the reverse arrangement direction to obtain interference text images; and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
Specifically, as shown in fig. 3 and 4, a text image in a forward arrangement direction is provided in fig. 3, and a text image in a reverse arrangement direction is provided in fig. 4. Also, the reverse text image may be obtained by performing rotation processing on the normal text image. And, labeling information for the text image in the positive arrangement direction, if labeled as 1, and labeling information for the text image in the negative arrangement direction, if labeled as 2. And simultaneously labeling the text content in the training text image to obtain a text content labeling result. In the above embodiment, considering that the text arrangement directions in most samples in the image are positive, in order to equalize the positive text samples and the reverse text samples in the training sample set, the reverse text image is obtained by rotating the image in the positive text arrangement direction by 180 °.
In order to further realize the expansion of the training sample images so as to increase the number of the sample images, the server further continues the expansion processing on the collected main text images and the reverse text images. The specific expansion processing mode comprises interference processing. The interference processing includes expanding the sample by performing one or more of small-angle rotation, noise addition, filtering and other modes on the acquired sample image. By the aid of the sample image expansion processing, on one hand, data volume of the sample image is increased, so that a machine learning model can learn more image information, on the other hand, model training is performed by using slightly inferior training samples, so that a trained model network has stronger recognition capability and strong robustness, and image content recognition can be performed on text images to be recognized with poor quality.
In another embodiment, before the server obtains the text image to be trained, the method further includes: the server obtains an initial text image, and the text content in the initial text image may include text content distributed in different arrangement directions, specifically, the text content distributed in the same initial text image, that is, the text content distributed in the forward arrangement direction, also includes text content distributed in the reverse arrangement direction, or may also include text content distributed in an oblique arrangement direction corresponding to an angle between the forward arrangement direction and the reverse arrangement direction. At this time, the correction processing of the text content arrangement direction can be performed on the initial text image through the server, so that the text contents in the same text image to be trained are distributed in the same text arrangement direction.
The specific correction treatment mode comprises the following steps: the server acquires an initial text image, detects the text arrangement direction of text content included in the initial text image, determines whether to execute corresponding text arrangement direction correction processing according to a detection result, and finally obtains a text image to be trained so as to ensure that the text content included in the same text image to be trained is distributed in the same arrangement direction through the detection processing and the correction processing. Specifically, in one embodiment, the service detects an initial text image, performs detection processing on the distribution arrangement direction of text contents in the initial text image to obtain the distribution arrangement direction of each text content in the initial text image, and performs segmentation processing on the text contents in different distribution arrangement directions to obtain a plurality of text images to be trained, thereby ensuring that the text contents included in the same text image to be trained are all distributed in the same arrangement direction. In another embodiment, after acquiring the initial text image and detecting the distribution arrangement direction of the text content included in the initial text image, the server further includes a position for correcting the arrangement direction of the text content in different distribution arrangement directions, and then obtains the text image to be trained according to the corrected initial text image. Note that, the correction processing may include correcting the text contents in the inverted arrangement direction and the oblique squares to the positive arrangement direction, or correcting the text contents in the positive arrangement direction and the oblique squares to the inverted arrangement direction.
In one embodiment, the server acquires one or more initial text images, and performs detection and correction processing on text contents in the initial text images to obtain text images to be trained, so that the text contents in the same text image to be trained are distributed in the same arrangement direction.
Step 604, inputting the training text images into the initial text arrangement direction recognition model respectively, so as to recognize the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model, and obtain the initial recognition result of the text arrangement direction.
The initial text arrangement direction recognition model is a parameter of which the model parameter needs to be further modified, the model parameter in the initial text arrangement direction recognition model is used as the initial arrangement direction recognition parameter, the text content arrangement direction in the training text image is recognized according to the initial text arrangement direction recognition model to obtain a text arrangement direction initial recognition result, then the text arrangement direction initial recognition result can be compared with the real text arrangement direction, and whether the current initial arrangement direction recognition parameter is accurate or not is determined according to the comparison result.
Step 606, determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the initial content recognition parameter corresponding to the determined initial text content recognition model to obtain an initial recognition result of the text content.
The initial text content identification model is a model of which the model parameters need to be further modified, the text content arrangement direction in the training text image is identified according to the initial text content identification model to obtain a text content initial identification result, then the text content initial identification result is compared with the real text content, and whether the current initial content identification parameter is accurate or not is determined according to the comparison result.
Step 608, determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label.
The real label is a real label corresponding to the training text image, and the real label comprises the real text arrangement direction and the real text content of the training text image. The server obtains a text arrangement direction initial recognition result obtained by recognizing the text arrangement direction of the training text image according to the initial text arrangement direction recognition model, obtains a text content initial recognition result obtained by recognizing the text content of the training text image according to the initial text content recognition model, determines an arrangement direction loss function according to the text arrangement direction initial recognition result and the real text arrangement direction, determines a text content loss function according to the text arrangement direction initial recognition result and the real text content, and further determines a target loss function according to the arrangement direction loss value and the text content loss value.
And 610, adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to a target loss function until the training end conditions are met, acquiring the current arrangement direction identification parameters and the current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
And adjusting the sizes of the initial arrangement direction identification parameters and the initial content identification parameters through the target loss function, continuously executing the identification process of the training text image according to the adjusted parameters to obtain an identification result, continuously determining the target loss function value according to the identification result, and stopping the training of the model until the training requirement is met to obtain a final text arrangement direction identification model and a final text content identification model. The training stopping condition may include that the iteration number obtains a preset number, the training precision reaches the preset condition, or the training sample meets the preset condition, and the like, which is not limited herein.
In the above embodiment, the model is trained according to the loss value corresponding to the text arrangement direction and the loss value corresponding to the text content, so that the trained model is ensured to have the recognition capability of the text arrangement direction and the recognition capability of the text content at the same time, and the recognition efficiency of the text content is improved.
In one embodiment, inputting the training text images into the initial text alignment direction recognition model respectively comprises: extracting image features of the training text image to obtain training text image features; and respectively inputting the training text image characteristics into the initial text arrangement direction recognition model.
The server extracts image features of the input training text image according to a preset neural network model to obtain training text image features. Specifically, the extraction of the image features may be performed based on the CNN-based network.
As shown in fig. 7, fig. 7 is a schematic diagram of a model for extracting image features in an embodiment. In fig. 7, the convolution operation is performed on the left input training text image to obtain the right output training image feature. In order to improve the running speed of the network, the extraction of image features in villages is performed by using a base network, the base network adopts a MobileNet, and the convolution in the network uses a deep separation convolution (depthwise partial convolution), so that the calculation amount of the network can be reduced.
In another embodiment, in order to further improve the performance of the network, on the basis of the deep separation convolution, the network is added with an SE block (SE block and Excitation block) module, and the SE block module can learn the weights of the feature maps through the mass of the network, so that the more useful feature map weight is larger, and the weight of the feature map with small effect is reduced.
In one embodiment, the initial text content recognition model comprises a forward arrangement direction text content recognition model and a reverse arrangement direction text content recognition model; determining an initial text content recognition model corresponding to a training text image according to an initial recognition result of the text arrangement direction, wherein the initial text content recognition model comprises the following steps: when the initial recognition result of the text arrangement direction corresponds to the positive arrangement direction, determining that the text content recognition model in the positive arrangement direction is the initial text content recognition model corresponding to the training text image; and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
The initial text content recognition model comprises a forward arrangement direction text content recognition model and a reverse arrangement direction text content recognition model, when the initial text arrangement direction recognition model determines that a text square of a training image corresponds to a forward arrangement direction, the model which correspondingly determines to recognize the text content of the training image is the forward arrangement direction text content recognition model, otherwise, the reverse arrangement direction text content recognition model is determined.
In one embodiment, the text content recognition model may be specifically an RNN network, and the RNN network is divided into an RNN _1 network and an RNN _2 network according to text content that can be recognized by the text content recognition model, where the RNN _1 network is used for recognizing text content in an image in a forward arrangement direction, and the RNN _2 network is used for recognizing text content in an image in an inverted arrangement direction. Specifically, the text content identification RNN network part consists of two branches of text identification RNN _1 and text identification RNN _2, the text identification RNN _1 branch outputs a text identification result in a positive arrangement direction, and the text identification RNN _2 branch outputs a text identification result in a reverse arrangement direction. And each text content identification RNN branch adopts a bidirectional Long Short-Term Memory neural network (BilSTM) to identify text content, the BilSTM can memorize medium-Long time sequence content in a sequence through a gate structure, wherein a forgetting gate controls which information is forgotten in the current cell state of the LSTM, an input gate controls which information can be transmitted in the current cell state, and an output gate controls which information in the current cell state can be output to the next cell state.
Furthermore, a Continuous Time Classification (CTC) structure may be connected after the bilst network, and the CTC structure does not need to segment and align the characters in the training text image, and can directly perform end-to-end training according to the input image, and output the prediction result, so as to improve the accuracy of text content recognition.
In one embodiment, determining the target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real tag comprises: when the initial text content recognition model corresponds to the text content recognition model in the positive arrangement direction, acquiring a text content recognition result in the positive arrangement direction obtained by recognizing the training text image by the text content recognition model in the positive arrangement direction, and determining a target loss function according to the text content recognition result in the positive arrangement direction, the text initial recognition result and the real label; when the initial text content recognition model corresponds to the inverted arrangement direction text content recognition model, acquiring an inverted arrangement direction text content recognition result obtained by recognizing the training text image by the inverted arrangement direction text content recognition model, and determining a target loss function according to the inverted arrangement direction text content recognition result, the text initial recognition result and the real label.
As shown in fig. 8, fig. 8 is a schematic diagram of a network model structure provided in an embodiment. The image recognition system comprises a CNN-based network part which is used for extracting image features of an image to be recognized, then inputting the extracted image features into a branch network text recognition RNN and a text forward and backward classification network respectively, and recognizing text contents and the arrangement direction of the text contents respectively. Continuing to refer to fig. 5, in fig. 5, the backbone network is the base network CNN, the image features are extracted through the base network, and then the image features are respectively input into three different branches, the uppermost branch is used for forward and backward classification of the text, and is used for obtaining whether the text arrangement direction is a forward arrangement direction or a backward arrangement direction, the middle branch is the RNN1 network, and is used for inputting the corresponding image features into the network branch RNN _1 when the text arrangement direction obtained by recognition is the forward arrangement direction, and inputting the image features into RNN _2 when the text arrangement direction is the backward arrangement direction, and the step of recognizing the text content is directly performed according to the images in the backward arrangement direction.
Specifically, after the image features are extracted from the base network, the server inputs the image features into different branches, for example, the image features are input into a network branch of forward and backward text classification for predicting whether the text content of the input image is in a forward arrangement direction or a backward arrangement direction, where the network branch is specifically composed of a fully connected layer (FC) with a dimension of 2. In the output result, 0 represents that the text arrangement direction is a reverse arrangement direction, and 1 represents that the text arrangement direction is a positive arrangement direction.
In one embodiment, the loss function of the forward and backward text classification network is determined as a softmax cross entropy loss function, and the loss function is used for training classification of forward and backward arrangement directions of image texts. The loss function of the text content identification RNN part is CTC loss, the loss function is used for predicting the text content in the image, characters in the image do not need to be aligned manually, and the CTC can train a network model end to end.
In the network training process, firstly, training sample data is prepared, each sample image is marked with a group of labels, the first bit of each label is a label with a forward and reverse text square in the image, when the text arrangement direction is the forward arrangement direction, the label is 1, when the text arrangement direction is the reverse arrangement direction, the label is 0, and the rest bits of each label are character content information of the text in the image. The Loss of the forward and backward classification branches of the text arrangement direction is softmax _ Loss. The Loss of the RNN _1 branch is identified as label [0] XCTC _ Loss1, and the Loss of the RNN _2 branch is identified as (1-label [0]) XCTC _ Loss2, wherein softmax _ Loss reads the first label and CTC _ Loss reads the remaining labels. When the text arrangement direction in the training sample image is a positive arrangement direction, label [0] is 1, (1-label [0]) is 0, and the total loss of the network is: and the Loss is 1 × CTC _ Loss1+ softmax _ Loss, the branch Loss of the text content recognition network RNN _2 is 0, and the back propagation is not performed during network training. When the text arrangement direction in the training sample image is the reverse arrangement direction, label [0] is 0, (1-label [0]) is 1, and the total loss of the network is: loss ═ 1 × CTC _ Loss2+ softmax _ Loss, that is, CTC _ Loss1 and softmax _ Loss participate in network computation when the text alignment direction in the training sample image is the positive alignment direction, and CTC _ Loss2 and softmax _ Loss participate in network computation when the text alignment direction in the training sample image is the negative alignment direction. And training the network until loss is reduced to a preset value, and finishing network training to obtain a trained network model.
In the foregoing embodiment, in the network training process, when it is recognized that the text content arrangement direction corresponds to the positive arrangement direction, the text content is input into the recognition network in the positive arrangement direction for text content recognition, where the Loss function used in this case is 1 × CTC _ Loss1+ softmax _ Loss, that is, the network in the reverse arrangement direction does not participate in the computation of Loss, and similarly, the same holds true for the other network.
In the process of specifically using the network to execute the identification of the text image to be identified, the text image to be identified is directly input, then the three branches respectively output corresponding results, and the forward and reverse arrangement direction of the text content is determined according to the content in the first branch, if the text content is in the forward arrangement direction, the text content obtained by identification is extracted from the network branch in the forward arrangement direction, otherwise, the text content obtained by identification is extracted from the network in the reverse arrangement direction. Specifically, a text image is input into the network, when the output result of forward and backward branching of the text is 1, the network judges that the arrangement direction of the text of the input image is positive, and at the moment, the recognition result of the branch of the text recognition RNN _1 is output. When the text forward-backward branch output result is 0, the network judges that the arrangement direction of the text of the input image is backward, and at the moment, the recognition result of the text recognition RNN _2 branch is output.
According to the technical scheme, the forward and backward text recognition network is provided, the forward and backward text classification network branches are added behind the base network, the network can judge the forward and backward arrangement direction of characters in an input picture, and then the content of the text is recognized through the RNN. The text recognition network does not need to additionally increase a text arrangement direction model to judge the forward and backward arrangement direction of the input text, and is more efficient and convenient. And in the network training stage, the calculation mode of the loss is adjusted according to the forward and backward of the text in the input picture, and the parameters of the RNN and the text classification network branches can be effectively trained through different loss modes, so that the network convergence effect is better.
The forward and backward text recognition network provided by the application enables the network to directly recognize forward and backward text contents, does not need to additionally use a character arrangement direction judgment model to judge and forward the arrangement direction of characters in advance, and then uses a recognition model to recognize the text contents. The method can respectively calculate the loss of the network according to the forward and backward information of the text arrangement direction in the image, and train and optimize the network.
It should be understood that although the steps in the flowcharts of fig. 2 and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 9, there is provided a text recognition apparatus including:
an obtaining module 902, configured to obtain a text image to be recognized.
The recognition module 904 is configured to input the text image to be recognized into a pre-trained machine learning model, so as to recognize the text image to be recognized according to model parameters pre-trained in the machine learning model, so as to obtain text recognition contents, where the machine learning model is obtained by training a plurality of training text images, and text contents in each training text image are distributed in the same arrangement direction and text contents in different training text images are distributed in different arrangement directions.
In one embodiment, the recognition module 904 is further configured to recognize the text image to be recognized according to a model parameter pre-trained in the machine learning model to obtain a recognition result of the text arrangement direction and a plurality of recognition results of the text content; and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
In one embodiment, the machine learning model includes a text arrangement direction recognition model and a text content recognition model; the text recognition device also comprises a training module, wherein the training module is used for acquiring a plurality of training text images; respectively inputting the training text images into an initial text arrangement direction recognition model, and recognizing the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model to obtain the initial recognition result of the text arrangement direction; determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the initial content recognition parameter corresponding to the determined initial text content recognition model to obtain an initial recognition result of the text content; determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label; and adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to the target loss function until the training end conditions are met, acquiring the current arrangement direction identification parameters and the current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
In one embodiment, the training module is further configured to extract image features of the training text image to obtain training text image features; and respectively inputting the training text image characteristics into the initial text arrangement direction recognition model.
In one embodiment, the initial text content recognition model comprises a forward arrangement direction text content recognition model and a reverse arrangement direction text content recognition model; the training module is also used for determining that the text content recognition model in the positive arrangement direction is an initial text content recognition model corresponding to the training text image when the initial recognition result of the text arrangement direction corresponds to the positive arrangement direction; and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
In one embodiment, the training module is further configured to, when the initial text content recognition model corresponds to a forward arrangement direction text content recognition model, obtain a forward arrangement direction text content recognition result obtained by the forward arrangement direction text content recognition model recognizing the training text image, and determine a target loss function according to the forward arrangement direction text content recognition result, the text initial recognition result, and the real label; when the initial text content recognition model corresponds to the inverted arrangement direction text content recognition model, acquiring an inverted arrangement direction text content recognition result obtained by recognizing the training text image by the inverted arrangement direction text content recognition model, and determining a target loss function according to the inverted arrangement direction text content recognition result, the text initial recognition result and the real label.
In one embodiment, the training module is further configured to obtain a text image in a positive arrangement direction in which text content is distributed in the positive arrangement direction; converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction; carrying out interference expansion processing on the text images in the positive arrangement direction and the text images in the reverse arrangement direction to obtain interference text images; and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
For the specific definition of the text recognition device, reference may be made to the above definition of the text recognition method, which is not described herein again. The modules in the text recognition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing text image data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a text recognition method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring a text image to be recognized; the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
In one embodiment, the processor, when executing the computer program, further performs the steps of: recognizing a text image to be recognized according to model parameters pre-trained in a machine learning model to obtain a text arrangement direction recognition result and a plurality of text content recognition results; and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a plurality of training text images; respectively inputting the training text images into an initial text arrangement direction recognition model, and recognizing the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model to obtain the initial recognition result of the text arrangement direction; determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the initial content recognition parameter corresponding to the determined initial text content recognition model to obtain an initial recognition result of the text content; determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label; and adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to the target loss function until the training end conditions are met, acquiring the current arrangement direction identification parameters and the current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
In one embodiment, the processor, when executing the computer program, further performs the steps of: extracting image features of the training text image to obtain training text image features; and respectively inputting the training text image characteristics into the initial text arrangement direction recognition model.
In one embodiment, the processor, when executing the computer program, further performs the steps of: when the initial recognition result of the text arrangement direction corresponds to the positive arrangement direction, determining that the text content recognition model in the positive arrangement direction is the initial text content recognition model corresponding to the training text image; and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
In one embodiment, the processor, when executing the computer program, further performs the steps of: when the initial text content recognition model corresponds to the text content recognition model in the positive arrangement direction, acquiring a text content recognition result in the positive arrangement direction obtained by recognizing the training text image by the text content recognition model in the positive arrangement direction, and determining a target loss function according to the text content recognition result in the positive arrangement direction, the text initial recognition result and the real label; when the initial text content recognition model corresponds to the inverted arrangement direction text content recognition model, acquiring an inverted arrangement direction text content recognition result obtained by recognizing the training text image by the inverted arrangement direction text content recognition model, and determining a target loss function according to the inverted arrangement direction text content recognition result, the text initial recognition result and the real label.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a text image in a positive arrangement direction, wherein the text content is distributed in the positive arrangement direction; converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction; carrying out interference expansion processing on the text images in the positive arrangement direction and the text images in the reverse arrangement direction to obtain interference text images; and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a text image to be recognized; the method comprises the steps of inputting a text image to be recognized into a pre-trained machine learning model, recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
In one embodiment, the computer program when executed by the processor further performs the steps of: recognizing a text image to be recognized according to model parameters pre-trained in a machine learning model to obtain a text arrangement direction recognition result and a plurality of text content recognition results; and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a plurality of training text images; respectively inputting the training text images into an initial text arrangement direction recognition model, and recognizing the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model to obtain the initial recognition result of the text arrangement direction; determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the initial content recognition parameter corresponding to the determined initial text content recognition model to obtain an initial recognition result of the text content; determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label; and adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to the target loss function until the training end conditions are met, acquiring the current arrangement direction identification parameters and the current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
In one embodiment, the computer program when executed by the processor further performs the steps of: extracting image features of the training text image to obtain training text image features; and respectively inputting the training text image characteristics into the initial text arrangement direction recognition model.
In one embodiment, the computer program when executed by the processor further performs the steps of: when the initial recognition result of the text arrangement direction corresponds to the positive arrangement direction, determining that the text content recognition model in the positive arrangement direction is the initial text content recognition model corresponding to the training text image; and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
In one embodiment, the computer program when executed by the processor further performs the steps of: when the initial text content recognition model corresponds to the text content recognition model in the positive arrangement direction, acquiring a text content recognition result in the positive arrangement direction obtained by recognizing the training text image by the text content recognition model in the positive arrangement direction, and determining a target loss function according to the text content recognition result in the positive arrangement direction, the text initial recognition result and the real label; when the initial text content recognition model corresponds to the inverted arrangement direction text content recognition model, acquiring an inverted arrangement direction text content recognition result obtained by recognizing the training text image by the inverted arrangement direction text content recognition model, and determining a target loss function according to the inverted arrangement direction text content recognition result, the text initial recognition result and the real label.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a text image in a positive arrangement direction, wherein the text content is distributed in the positive arrangement direction; converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction; carrying out interference expansion processing on the text images in the positive arrangement direction and the text images in the reverse arrangement direction to obtain interference text images; and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of text recognition, the method comprising:
acquiring a text image to be recognized;
inputting the text image to be recognized into a pre-trained machine learning model, and recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
2. The method according to claim 1, wherein the recognizing the text image to be recognized according to the model parameters pre-trained in the machine learning model to obtain the text recognition content comprises:
recognizing the text image to be recognized according to model parameters pre-trained in the machine learning model to obtain a text arrangement direction recognition result and a plurality of text content recognition results;
and extracting a text content identification result corresponding to the text arrangement direction identification result from the plurality of text content identification results, and taking the extracted text content identification result as the text identification content of the text image to be identified.
3. The method of claim 1, wherein the machine learning model comprises a text alignment direction recognition model and a text content recognition model; the training modes of the text arrangement direction recognition model and the text internal identity recognition model comprise:
acquiring a plurality of training text images;
respectively inputting the training text images into an initial text arrangement direction recognition model, and recognizing the text arrangement direction of the text content in the training text images according to the initial arrangement direction recognition parameters in the initial text arrangement direction recognition model to obtain a text arrangement direction initial recognition result;
determining an initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction, and recognizing the text content of the training text image by using the determined initial content recognition parameter corresponding to the initial text content recognition model to obtain an initial text content recognition result;
determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real label;
and adjusting the initial arrangement direction identification parameters and the initial content identification parameters according to the target loss function until the training end conditions are met, acquiring current arrangement direction identification parameters and current content identification parameters, obtaining a text arrangement direction identification model according to the current arrangement direction identification parameters, and obtaining a text content identification model according to the current content identification parameters.
4. The method according to claim 3, wherein the inputting the training text images into the initial text alignment direction recognition model respectively comprises:
extracting image features of the training text image to obtain training text image features;
and respectively inputting the training text image characteristics into an initial text arrangement direction recognition model.
5. The method of claim 3, wherein the initial textual content recognition model comprises a forward direction textual content recognition model and a reverse direction textual content recognition model; the determining of the initial text content recognition model corresponding to the training text image according to the initial recognition result of the text arrangement direction includes:
when the initial recognition result of the text arrangement direction corresponds to a positive arrangement direction, determining that the text content recognition model in the positive arrangement direction is an initial text content recognition model corresponding to the training text image;
and when the initial recognition result of the text arrangement direction corresponds to the inverted arrangement direction, determining that the text content recognition model in the inverted arrangement direction is the initial text content recognition model corresponding to the training text image.
6. The method according to claim 5, wherein the determining a target loss function according to the initial recognition result of the text arrangement direction, the initial recognition result of the text content and the real tag comprises:
when the initial text content recognition model corresponds to a forward arrangement direction text content recognition model, acquiring a forward arrangement direction text content recognition result obtained by recognizing the training text image by the forward arrangement direction text content recognition model, and determining a target loss function according to the forward arrangement direction text content recognition result, the text initial recognition result and a real label;
when the initial text content recognition model corresponds to a reverse arrangement direction text content recognition model, acquiring a reverse arrangement direction text content recognition result obtained by recognizing the training text image by the reverse arrangement direction text content recognition model, and determining a target loss function according to the reverse arrangement direction text content recognition result, the text initial recognition result and the real label.
7. The method of any one of claims 1 to 6, wherein the obtaining a plurality of training text images comprises:
acquiring a text image in a positive arrangement direction, wherein the text content is distributed in the positive arrangement direction;
converting the text images in the positive arrangement direction to obtain text images in the reverse arrangement direction;
performing interference expansion processing on the text images in the forward arrangement direction and the text images in the reverse arrangement direction to obtain interference text images;
and obtaining a training text image according to the text image in the positive arrangement direction, the text image in the reverse arrangement direction and the interference text image.
8. A text recognition apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a text image to be recognized;
the recognition module is used for inputting the text images to be recognized into a pre-trained machine learning model, recognizing the text images to be recognized according to model parameters pre-trained in the machine learning model to obtain text recognition contents, wherein the machine learning model is obtained by training according to a plurality of training text images, the text contents in each training text image are distributed in the same arrangement direction, and the text contents in different training text images are distributed in different arrangement directions.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202011390409.0A 2020-12-02 2020-12-02 Text recognition method and device, computer equipment and storage medium Pending CN112560599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011390409.0A CN112560599A (en) 2020-12-02 2020-12-02 Text recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011390409.0A CN112560599A (en) 2020-12-02 2020-12-02 Text recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112560599A true CN112560599A (en) 2021-03-26

Family

ID=75047833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011390409.0A Pending CN112560599A (en) 2020-12-02 2020-12-02 Text recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112560599A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780131A (en) * 2021-08-31 2021-12-10 众安在线财产保险股份有限公司 Text image orientation recognition method and text content recognition method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004046528A (en) * 2002-07-11 2004-02-12 Fujitsu Ltd Document direction estimation method and document direction estimation program
CN103136523A (en) * 2012-11-29 2013-06-05 浙江大学 Arbitrary direction text line detection method in natural image
CN110443239A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 The recognition methods of character image and its device
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium
CN111783541A (en) * 2020-06-01 2020-10-16 北京捷通华声科技股份有限公司 Text recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004046528A (en) * 2002-07-11 2004-02-12 Fujitsu Ltd Document direction estimation method and document direction estimation program
CN103136523A (en) * 2012-11-29 2013-06-05 浙江大学 Arbitrary direction text line detection method in natural image
CN110443239A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 The recognition methods of character image and its device
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium
CN111783541A (en) * 2020-06-01 2020-10-16 北京捷通华声科技股份有限公司 Text recognition method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780131A (en) * 2021-08-31 2021-12-10 众安在线财产保险股份有限公司 Text image orientation recognition method and text content recognition method, device and equipment
CN113780131B (en) * 2021-08-31 2024-04-12 众安在线财产保险股份有限公司 Text image orientation recognition method, text content recognition method, device and equipment

Similar Documents

Publication Publication Date Title
US10936911B2 (en) Logo detection
CN109858555B (en) Image-based data processing method, device, equipment and readable storage medium
CN107704838B (en) Target object attribute identification method and device
CN106415594B (en) Method and system for face verification
CN111950424B (en) Video data processing method and device, computer and readable storage medium
US8606022B2 (en) Information processing apparatus, method and program
CN109190561B (en) Face recognition method and system in video playing
CN111914908B (en) Image recognition model training method, image recognition method and related equipment
CN111738269B (en) Model training method, image processing device, model training apparatus, and storage medium
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
CN112037142B (en) Image denoising method, device, computer and readable storage medium
CN112085088A (en) Image processing method, device, equipment and storage medium
CN112633424B (en) Image processing method, image processing apparatus, image processing device, and storage medium
CN112381837A (en) Image processing method and electronic equipment
CN108875559B (en) Face recognition method and system based on certificate photo and field photo
CN111241924A (en) Face detection and alignment method and device based on scale estimation and storage medium
CN105631404A (en) Method and device for clustering pictures
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN109241893B (en) Road selection method and device based on artificial intelligence technology and readable storage medium
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN112560599A (en) Text recognition method and device, computer equipment and storage medium
CN115700845A (en) Face recognition model training method, face recognition device and related equipment
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
CN115713669A (en) Image classification method and device based on inter-class relation, storage medium and terminal
CN112329915A (en) Model training method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination