CN113837167A - Text image recognition method, device, equipment and storage medium - Google Patents

Text image recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN113837167A
CN113837167A CN202111012920.1A CN202111012920A CN113837167A CN 113837167 A CN113837167 A CN 113837167A CN 202111012920 A CN202111012920 A CN 202111012920A CN 113837167 A CN113837167 A CN 113837167A
Authority
CN
China
Prior art keywords
text
language
branch
network
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111012920.1A
Other languages
Chinese (zh)
Inventor
高大帅
李健
武卫东
陈明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202111012920.1A priority Critical patent/CN113837167A/en
Publication of CN113837167A publication Critical patent/CN113837167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application relates to the technical field of data processing, in particular to a text image identification method, a text image identification device, text image identification equipment and a storage medium, and aims to shorten the development period of a multi-language identification task and improve multi-language identification performance. The method comprises the following steps: performing feature extraction on the text image to be recognized through a shared backbone network to obtain a shared feature map; carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line; performing feature extraction on the shared feature map according to the position information of the character line through the shared backbone network to obtain text features; performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features; and identifying the text features through corresponding text line identification branches according to the language category information to obtain a text identification result.

Description

Text image recognition method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a text image identification method, a text image identification device, text image identification equipment and a storage medium.
Background
In the field of optical character recognition, multi-language recognition is a leading-edge hotspot and difficulty, and is used for recognizing characters of different languages in pictures, so that the method has wide application scenes in various fields such as production, life, teaching and training and the like, for example, the method is used in bilingual menus and bilingual teaching. The existing multilingual recognition methods have two types, one is to use three independent branches, firstly detect the character line, then classify the characters according to languages, and recognize the classified characters. And the other method is to train a uniform multi-language recognition model, firstly to perform character detection, and then to directly input the character detection into the uniform multi-language recognition model for recognition.
In the prior art, three branches of the first method are independent from each other, are trained separately, have no interaction with each other, and do not utilize the advantage of end-to-end, resulting in poor recognition efficiency and effect. The second method needs to use a multilingual sample training model in advance, when new languages are added, the whole model needs to be retrained, and data among the languages are unbalanced, so that the development period of the model is long and the recognition effect is poor.
Disclosure of Invention
The embodiment of the application provides a text image identification method, a text image identification device, text image identification equipment and a storage medium, and aims to shorten the development period of a multi-language identification task and improve multi-language identification performance.
A first aspect of an embodiment of the present application provides a text image recognition method, where the method includes:
performing feature extraction on the text image to be recognized through a shared backbone network to obtain a shared feature map;
carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line;
performing feature extraction on the shared feature map according to the position information of the character line through the shared backbone network to obtain text features;
performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features;
and identifying the text features through corresponding text line identification branches according to the language category information to obtain a text identification result.
Optionally, the method is implemented based on a multilingual text-image recognition network, and the construction of the multilingual text-image recognition network includes:
using the shared backbone network as a feature extraction network of the multilingual text-image recognition network;
using the character line detection branch and the character line identification branch as a character identification network of the multilingual text image identification network;
adding the language classification branch in the character recognition network;
respectively selecting proper loss functions for the character line detection branch and the language classification branch, and constructing a multi-task loss function according to the functions;
and integrally training the multilingual text image recognition network through the multitask loss function to obtain the trained multilingual text image recognition network.
Optionally, the training of the multilingual text-image recognition network by selecting appropriate loss functions for different branches includes:
collecting a plurality of pictures containing various language characters, and putting the pictures containing the various language characters into the same set to obtain a training set;
inputting the training set into the multilingual text image recognition network, wherein the position information, language information and text information of the character lines on the pictures in the training set are obtained in advance;
the multilingual text image recognition network recognizes the position, language type and character recognition result of the character row on the picture in the training set;
and comparing the position, language type and character recognition result of the character line with the position information, language information and text information of the character line, transmitting the difference obtained by comparison into the multitask loss function, and adjusting the parameters of the multilingual text image recognition network to obtain the trained multilingual text image recognition network.
Optionally, when a new language needs to be identified, the method further includes:
adding a character row identification branch corresponding to the new language in the character identification network;
and under the condition of ensuring that the parameters of the existing model are not changed, training the language classification branch and the character row recognition branch by using the picture containing the new language to obtain the trained multilingual text image recognition network.
Optionally, under a condition that parameters of an existing model are not changed, training the language classification branch and the character line recognition branch by using a picture containing the new language to obtain the trained multilingual text image recognition network, including:
when the new language is a language with more characters, selecting a plurality of pictures to train the language classification branch and the character row recognition branch;
and when the new language is a language with fewer characters, selecting a fewer number of pictures to train the language classification branch and the character line recognition branch.
A second aspect of the embodiments of the present application provides a device for recognizing multiple text images, the device including:
the image recognition module is used for extracting the features of the text image to be recognized through a shared backbone network to obtain a shared feature map;
the character line detection module is used for carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line;
the feature extraction module is used for extracting features of the shared feature graph according to the position information of the character line through the shared backbone network to obtain text features;
the language classification module is used for performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features;
and the character line identification module is used for identifying the text characteristics through the corresponding character line identification branch according to the language category information to obtain a character identification result.
Optionally, the method is implemented based on a multilingual text-image recognition network, and the construction of the multilingual text-image recognition network includes:
using the shared backbone network as a feature extraction network of the multilingual text-image recognition network;
using the character line detection branch and the character line identification branch as a character identification network of the multilingual text image identification network;
adding the language classification branch in the character recognition network;
respectively selecting proper loss functions for the character line detection branch and the language classification branch, and constructing a multi-task loss function according to the functions;
and integrally training the multilingual text image recognition network through the multitask loss function to obtain the trained multilingual text image recognition network.
Optionally, the overall training of the multilingual text-image recognition network through the multitask loss function is performed to obtain the trained multilingual text-image recognition network, including:
collecting a plurality of pictures containing various language characters, and putting the pictures containing the various language characters into the same set to obtain a training set;
inputting the training set into the multilingual text image recognition network, wherein the position information, language information and text information of the character lines on the pictures in the training set are obtained in advance;
the multilingual text image recognition network recognizes the position, language type and character recognition result of the character row on the picture in the training set;
and comparing the position, language type and character recognition result of the character line with the position information, language information and text information of the character line, transmitting the difference obtained by comparison into the multitask loss function, and adjusting the parameters of the multilingual text image recognition network to obtain the trained multilingual text image recognition network.
Optionally, the apparatus further comprises:
the character row recognition branch adding submodule is used for adding a character row recognition branch corresponding to the new language in the character recognition network;
and the model training submodule is used for training the language classification branch and the character line recognition branch by using the picture containing the new language under the condition of ensuring that the existing model parameters are not changed, so as to obtain the trained multilingual text image recognition network.
Optionally, the model training sub-module includes:
the first model training submodule is used for selecting a large number of pictures to train the language classification branch and the character row recognition branch when the new language is a language with more characters;
and the second model training submodule is used for selecting a small number of pictures to train the language classification branch and the character row recognition branch when the new language is a language with fewer characters.
A third aspect of embodiments of the present application provides a readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the method according to the first aspect of the present application.
A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the present application.
By adopting the text image identification method provided by the application, the feature extraction is carried out on the text image to be identified through the shared backbone network to obtain a shared feature map; carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line; performing feature extraction on the shared feature map according to the position information of the character line through the shared backbone network to obtain text features; performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features; and identifying the text features through corresponding text line identification branches according to the language category information to obtain a text identification result. According to the method and the device, the character line detection branch, the language classification branch and the character line identification branch are placed in the same network for end-to-end learning, the advantage of end-to-end multitask complementation is fully utilized, when a new language is added, only the corresponding character line identification branch and the language classification branch need to be added separately, the whole network does not need to be trained again, and therefore the development period of a multi-language identification task is shortened, and the performance of multi-language identification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart of a text image recognition method according to an embodiment of the present application;
FIG. 2 is a flow chart of a multilingual text-image recognition network according to an embodiment of the present application;
fig. 3 is a schematic diagram of a text image recognition apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a flowchart of a text image recognition method according to an embodiment of the present application.
As shown in fig. 1, the method comprises the steps of:
s11: and performing feature extraction on the text image to be recognized through a shared backbone network to obtain a shared feature map.
In this embodiment, the text image to be recognized is an image containing a text, where the text is a multilingual text, and multiple texts in different languages may exist in one image at the same time. The shared backbone network is used for extracting text features from the text image to be recognized, and the shared feature graph is the feature image extracted from the text image to be recognized and can accurately reflect the regional position of characters in the text image to be recognized.
In this embodiment, the shared backbone network is a shared CNN (convolutional neural network) skeleton, which is equivalent to the backbone and skeleton of the entire multilingual text image recognition network, and the shared backbone network extracts the text image to be recognized to obtain a shared feature map, analyzes the shared feature map, and determines the position of the region containing the text. The network can identify the range of all languages of characters on the picture and perform further feature extraction.
For example, the text image to be recognized may simultaneously contain chinese, english, russian, and the like, and the shared CNN skeleton may be selected from a network such as resnet50, resnet101, and the like.
S12: and carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain the position information of the character line.
In this embodiment, the character line detection branch is to identify the position of the character line in the shared feature map as the position information of the character line by a character line recognition algorithm, which belongs to one of OCR (optical character) recognition algorithms.
For example, the literal line detection algorithm may use a bezier curve regression algorithm or the like used in ABCnet.
S13: and performing feature extraction on the shared feature graph according to the position information of the character line through the shared backbone network to obtain text features.
In this embodiment, the shared backbone network performs feature extraction on the text line at the position according to the position information of the text line detected by the text line detection branch to obtain the roi feature of the text line on the CNN skeleton, that is, the image feature of the text line selected from the picture frame, and obtain the feature vector of the text at this part.
For example, if 3 lines of characters are framed in one picture, the shared backbone network extracts the picture features of the framed 3 lines of characters to obtain the text feature vectors of the 3 lines of characters.
S14: and carrying out language classification on the text features through language classification branches to obtain language category information corresponding to the text features.
In this embodiment, the language classification branch classifies the text features by a language classification algorithm to obtain language classification information corresponding to the text features, where the language classification information shows which language the input text features belong to, and when the text features include multiple languages, the features can be classified according to the corresponding languages.
For example, the language classification algorithm may select a softmax cross entropy loss algorithm, and when the text corresponding to the text feature includes "hello" and "hello", the language classification branch obtains that "hello" belongs to chinese and "hello" belongs to english through the language classification algorithm.
S15: and identifying the text features through corresponding text line identification branches according to the language category information to obtain a text identification result.
In this embodiment, the text line recognition branch is to recognize text features through a text line recognition algorithm to obtain a text recognition result. The line recognition algorithm is also one of the OCR recognition algorithms. There are separate line recognition algorithms for different languages, i.e. there are separate branch lines for different languages.
For example, the text line recognition algorithm may employ the CRNN _ CTC algorithm. When the text features include Chinese "hello" and English "hello", the text line recognition algorithm uses the Chinese recognition branch to recognize the text image as "hello", and uses the English branch to recognize the text image as "hello".
In this embodiment, feature extraction is performed through a suitable CNN skeleton, and then the position of the character, the language information, and the character recognition result are obtained through the character line detection branch, the language classification branch, and the character line recognition branch. The CNN network is used as a framework, the branch network is added, and multilingual character line detection, language classification and character line identification are solved in the same deep learning network in an end-to-end mode, so that complementation among sub-networks is realized, and the accuracy of character identification is improved.
In another embodiment of the present application, the method is implemented based on a multilingual text-image recognition network, and the multilingual text-image recognition network is constructed by the steps of:
s21: and taking the shared backbone network as a feature extraction network of the multilingual text image recognition network.
In this embodiment, the multilingual text-image recognition network is an end-to-end deep learning network, and the multilingual text images are input to the network, and the position, language type, and text line recognition result of the text lines in the images are output.
In this embodiment, after the image data is input into the network, the shared backbone network in the network is entered first, the shared backbone network is a sub-network of the network, and the shared backbone network extracts the feature map in the input image. The shared backbone network may select the resnet50 network.
S22: and taking the character line detection branch and the character line identification branch as a character identification network of the multilingual text image identification network.
In this embodiment, the text line detection branch and the text line recognition branch are disposed in the same network, that is, in a text recognition network, the text recognition network is a sub-network in a multilingual text-image recognition network, and the text recognition network performs text line detection and text line recognition on an input image.
Illustratively, the text line detection employs a Bezier curve regression algorithm in ABCnet, and the text recognition branch employs a CRNN _ CTC algorithm.
In this embodiment, when the network is constructed, the number of the text line recognition branches may be set according to the number of languages on the text image to be recognized, for example, if the number of languages on the text image to be recognized is 3, 3 text line recognition branches are added to the text recognition network, and each branch is responsible for the text line recognition of one language.
S23: and adding the language classification branch in the character recognition network.
In this embodiment, the language classification branch is added to the character recognition network, and the language classification branch may adopt softmax cross entropy loss including a residual module.
S24: and selecting proper loss functions for the character line detection branch and the language classification branch respectively, and constructing a multi-task loss function according to the functions.
In this embodiment, an appropriate loss function needs to be selected for each branch to achieve the best training effect for each branch, and the loss functions are summed by a certain coefficient to obtain a multitask loss function.
Illustratively, the line detection branch uses an L _ det _ loss function, the language classification branch uses an L _ cls _ loss function, and the line identification branch uses an L _ CTC _ loss function. The multitask loss function is total _ loss ═ 0.5 × L _ det _ loss (text line detection loss function) +0.1 × L _ cls _ loss (language classification loss function) +1.0 × L _ CTC _ loss (text line identification loss function). Wherein, 0.5,1,1.0 is a coefficient value obtained by experiment.
S25: and integrally training the multilingual text image recognition network through the multitask loss function to obtain the trained multilingual text image recognition network.
In this embodiment, after the basic structure of the model is constructed, the multilingual text image recognition network needs to be trained integrally, and the multilingual text image recognition network is trained by using training samples in combination with a multitask loss function, so as to obtain the trained multilingual text image recognition network.
In the embodiment, the character line detection branch, the language classification branch and the character line identification branch are arranged in the same network for end-to-end learning, so that the character identification performance is improved, a proper loss function is selected for each branch, a multi-task loss function is constructed, a better training effect is achieved, and the model performance is improved.
As shown in fig. 2, fig. 2 is a flow chart of constructing a multilingual text image recognition network according to an embodiment of the present application. In the figure, firstly, a proper shared CNN framework is selected as a shared backbone network, then a proper character row detection algorithm is selected as a character row detection branch, the shared backbone network extracts character row characteristics according to character row position information detected by the character row detection branch to obtain roi characteristics of a character row on the CNN framework, a language classification algorithm is added into the language classification branch to set the language classification branch, and a character row identification algorithm is added into the character row identification branch to set the character row identification branch.
In another embodiment of the present application, the overall training of the multilingual text-image recognition network through the multitask loss function to obtain the trained multilingual text-image recognition network includes:
s31: collecting a plurality of pictures containing the characters in multiple languages, and putting the pictures containing the characters in multiple languages into the same set to obtain a training set.
In this embodiment, pictures containing multiple languages and characters in the training set may be collected from the network, and the collected data may be put into the same set to obtain the training set. In addition, in order to verify the effect of the model, a small number of samples can be put into another set as a test set.
Illustratively, 10 ten thousand lines of the multilingual sample are obtained using a text image simulation tool, and the 10 ten thousand lines of the sample are stored in the form of pictures, resulting in 10 ten thousand pictures containing multilingual words. And randomly selecting 9.5 thousands of pictures as a training set, and using the remaining 5 thousands of pictures as a test set.
And S32, inputting the training set into the multilingual text image recognition network, wherein the position information, language information and text information of the character lines on the pictures in the training set are obtained in advance.
In this embodiment, the training set is input into the multilingual text-image recognition network, and the model is not trained at this time, and its parameters need to be adjusted by the samples in the training set. The position information, language information and text information of the character lines on the pictures in the training set are information which is acquired in advance when the text images are simulated.
S33: and the multilingual text image recognition network recognizes the position, language type and character recognition result of the character row on the picture in the training set.
S34: and comparing the position, language type and character recognition result of the character line with the position information, language information and text information of the character line, transmitting the difference obtained by comparison into the multitask loss function, and adjusting the parameters of the multilingual text image recognition network to obtain the trained multilingual text image recognition network.
In this embodiment, under the tensrflow deep learning framework, network parameters of the multilingual text image recognition network can be adjusted through a multitask loss function, so as to obtain the trained multilingual text image recognition network.
After the network is finished, the pictures in the test set are input into the trained multilingual text image recognition network for training, and the training result is evaluated.
In another embodiment of the present application, when a new language needs to be identified, the method further includes:
s41: and adding a character row identification branch corresponding to the new language in the character identification network.
In this embodiment, when a new language needs to be identified, a character line identification branch corresponding to the new language may be added to a character identification network in the network based on the existing multilingual text image identification network.
For example, when two new languages, such as arabic and german, need to be recognized, a text line recognition branch corresponding to the two new languages is added to the text recognition network.
S42: and under the condition of ensuring that the parameters of the existing model are not changed, training the language classification branch and the character row recognition branch by using the picture containing the new language to obtain the trained multilingual text image recognition network.
In this embodiment, under the condition that the trained model parameters are not changed, the language classification branch and the two newly added character line recognition branches can be trained separately by using the picture containing the characters of the new language, after the training is completed, the updating of the multilingual text image recognition network is completed, and the updated multilingual text recognition network has the capability of recognizing the newly added language.
For example, when the new languages are arabic and german, the image including arabic and german is input into the multilingual text image recognition network, the language classification branches in the network are trained to classify arabic and german, and the two character line recognition branches in the network are trained to recognize arabic and german respectively.
In the embodiment, when the language needs to be identified, two independent character line identification branches are added in the network, and the character line identification branches and the language classification branches are trained independently by using new training data, so that the original parameters of the model are not changed, the model does not need to be trained uniformly, and the development period of the multilingual identification task is shortened.
In another embodiment of the present application, under the condition that the parameters of the existing model are not changed, the language classification branch and the character line recognition branch are trained by using the picture containing the new language, so as to obtain the trained multilingual text image recognition network, including:
s51: and when the new language is the language with more characters, selecting a plurality of pictures to train the language classification branch and the character line identification branch.
S52: and when the new language is a language with fewer characters, selecting a fewer number of pictures to train the language classification branch and the character line recognition branch.
In this embodiment, when the new language is a language with many characters, in order to ensure the performance of the model, more training samples need to be collected when the character line recognition branch and the language classification branch are trained, so that it can be ensured that the model learns as many characters as possible, and the recognition effect is ensured. When the new language is the language with fewer characters, a small amount of training samples are collected, and the training effect can be ensured.
For example, when the new language is Chinese, since Chinese is a language with more characters, more training sample training character line recognition branches and language classification branches are collected, such as 2 ten thousand samples containing Chinese. When the new language is english, since english contains only 26 letters and some characters, a small number of samples can be collected to train the line recognition branch and the language classification branch, for example, 5 thousand samples containing english.
In the embodiment, different character line recognition branches are used for each language, each branch can be dedicated to learning of the language sample corresponding to the branch, the training efficiency is guaranteed, the problem of uneven samples is solved, and the multi-language recognition performance is improved.
Based on the same inventive concept, an embodiment of the present application provides a text image recognition apparatus. Referring to fig. 3, fig. 3 is a schematic diagram of a text image recognition apparatus 300 according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
the image identification module 301 is configured to perform feature extraction on a text image to be identified through a shared backbone network to obtain a shared feature map;
a text line detection module 302, configured to perform text line detection on the shared feature map through a text line detection branch to obtain position information of a text line;
a feature extraction module 303, configured to perform feature extraction on the shared feature map according to the position information of the text line through the shared backbone network to obtain a text feature;
a language classification module 304, configured to perform language classification on the text features through language classification branches to obtain language category information corresponding to the text features;
and a text line identification module 305, configured to identify the text features through corresponding text line identification branches according to the language category information, so as to obtain a text identification result.
Optionally, the method is implemented based on a multilingual text-image recognition network, and the construction of the multilingual text-image recognition network includes:
using the shared backbone network as a feature extraction network of the multilingual text-image recognition network;
using the character line detection branch and the character line identification branch as a character identification network of the multilingual text image identification network;
adding the language classification branch in the character recognition network;
respectively selecting proper loss functions for the character line detection branch and the language classification branch, and constructing a multi-task loss function according to the functions;
and integrally training the multilingual text image recognition network through the multitask loss function to obtain the trained multilingual text image recognition network.
Optionally, the overall training of the multilingual text-image recognition network through the multitask loss function is performed to obtain the trained multilingual text-image recognition network, including:
collecting a plurality of pictures containing various language characters, and putting the pictures containing the various language characters into the same set to obtain a training set;
inputting the training set into the multilingual text image recognition network, wherein the position information, language information and text information of the character lines on the pictures in the training set are obtained in advance;
the multilingual text image recognition network recognizes the position, language type and character recognition result of the character row on the picture in the training set;
and comparing the position, language type and character recognition result of the character line with the position information, language information and text information of the character line, transmitting the difference obtained by comparison into the multitask loss function, and adjusting the parameters of the multilingual text image recognition network to obtain the trained multilingual text image recognition network.
Optionally, the apparatus further comprises:
the character row recognition branch adding submodule is used for adding a character row recognition branch corresponding to the new language in the character recognition network;
and the model training submodule is used for training the language classification branch and the character line recognition branch by using the picture containing the new language under the condition of ensuring that the existing model parameters are not changed, so as to obtain the trained multilingual text image recognition network.
Optionally, the model training sub-module includes:
the first model training submodule is used for selecting a large number of pictures to train the language classification branch and the character row recognition branch when the new language is a language with more characters;
and the second model training submodule is used for selecting a small number of pictures to train the language classification branch and the character row recognition branch when the new language is a language with fewer characters.
Based on the same inventive concept, another embodiment of the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the text image recognition method according to any of the above embodiments of the present application.
Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the electronic device implements the steps in the text image recognition method according to any of the above embodiments of the present application.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The text image recognition method, the text image recognition device, the text image recognition equipment and the storage medium provided by the application are introduced in detail, specific examples are applied in the text to explain the principle and the implementation of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (8)

1. A method for recognizing text images, the method comprising:
performing feature extraction on the text image to be recognized through a shared backbone network to obtain a shared feature map;
carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line;
performing feature extraction on the shared feature map according to the position information of the character line through the shared backbone network to obtain text features;
performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features;
and identifying the text features through corresponding text line identification branches according to the language category information to obtain a text identification result.
2. The method of claim 1, wherein the method is implemented based on a multilingual text-image recognition network, and wherein the step of constructing the multilingual text-image recognition network comprises:
using the shared backbone network as a feature extraction network of the multilingual text-image recognition network;
using the character line detection branch and the character line identification branch as a character identification network of the multilingual text image identification network;
adding the language classification branch in the character recognition network;
respectively selecting proper loss functions for the character line detection branch and the language classification branch, and constructing a multi-task loss function according to the functions;
and integrally training the multilingual text image recognition network through the multitask loss function to obtain the trained multilingual text image recognition network.
3. The method of claim 2, wherein the training the multilingual text-image recognition network as a whole by the multitasking loss function to obtain the trained multilingual text-image recognition network comprises:
collecting a plurality of pictures containing various language characters, and putting the pictures containing the various language characters into the same set to obtain a training set;
inputting the training set into the multilingual text image recognition network, wherein the position information, language information and text information of the character lines on the pictures in the training set are obtained in advance;
the multilingual text image recognition network recognizes the position, language type and character recognition result of the character row on the picture in the training set;
and comparing the position, language type and character recognition result of the character line with the position information, language information and text information of the character line, transmitting the difference obtained by comparison into the multitask loss function, and adjusting the parameters of the multilingual text image recognition network to obtain the trained multilingual text image recognition network.
4. The method of claim 2, wherein when a new language needs to be identified, the method further comprises:
adding a character row identification branch corresponding to the new language in the character identification network;
and under the condition of ensuring that the parameters of the existing model are not changed, training the language classification branch and the character row recognition branch by using the picture containing the new language to obtain the trained multilingual text image recognition network.
5. The method of claim 3, wherein training the language classification branch and the line recognition branch using the picture containing the new language to obtain the trained multilingual text-image recognition network while ensuring the existing model parameters are unchanged comprises:
when the new language is a language with more characters, selecting a plurality of pictures to train the language classification branch and the character row recognition branch;
and when the new language is a language with fewer characters, selecting a fewer number of pictures to train the language classification branch and the character line recognition branch.
6. A text image recognition apparatus, characterized in that the apparatus comprises:
the image recognition module is used for extracting the features of the text image to be recognized through a shared backbone network to obtain a shared feature map;
the character line detection module is used for carrying out character line detection on the shared characteristic diagram through a character line detection branch to obtain position information of a character line;
the feature extraction module is used for extracting features of the shared feature graph according to the position information of the character line through the shared backbone network to obtain text features;
the language classification module is used for performing language classification on the text features through language classification branches to obtain language category information corresponding to the text features;
and the character line identification module is used for identifying the text characteristics through the corresponding character line identification branch according to the language category information to obtain a character identification result.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the computer program is executed by the processor.
CN202111012920.1A 2021-08-31 2021-08-31 Text image recognition method, device, equipment and storage medium Pending CN113837167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111012920.1A CN113837167A (en) 2021-08-31 2021-08-31 Text image recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111012920.1A CN113837167A (en) 2021-08-31 2021-08-31 Text image recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113837167A true CN113837167A (en) 2021-12-24

Family

ID=78961765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111012920.1A Pending CN113837167A (en) 2021-08-31 2021-08-31 Text image recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113837167A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357174A (en) * 2022-03-18 2022-04-15 北京创新乐知网络技术有限公司 Code classification system and method based on OCR and machine learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357174A (en) * 2022-03-18 2022-04-15 北京创新乐知网络技术有限公司 Code classification system and method based on OCR and machine learning

Similar Documents

Publication Publication Date Title
CN105868184B (en) A kind of Chinese personal name recognition method based on Recognition with Recurrent Neural Network
CN110414683B (en) Method and device for solving mathematic questions
CN108121702B (en) Method and system for evaluating and reading mathematical subjective questions
CN111274239B (en) Test paper structuring processing method, device and equipment
CN104463101A (en) Answer recognition method and system for textual test question
CN107343223A (en) The recognition methods of video segment and device
CN108052504B (en) Structure analysis method and system for mathematic subjective question answer result
CN107301164B (en) Semantic analysis method and device for mathematical formula
CN107301163A (en) Text semantic analysis method and device comprising formula
CN109165564B (en) Electronic photo album, generation method, system, storage medium and computer equipment
Zhu An educational approach to machine learning with mobile applications
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN110852071B (en) Knowledge point detection method, device, equipment and readable storage medium
CN110633456B (en) Language identification method, language identification device, server and storage medium
CN116150404A (en) Educational resource multi-modal knowledge graph construction method based on joint learning
Rosli et al. Development of CNN transfer learning for dyslexia handwriting recognition
CN108229285A (en) Object classification method, the training method of object classification device, device and electronic equipment
CN113837167A (en) Text image recognition method, device, equipment and storage medium
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN107992482B (en) Protocol method and system for solving steps of mathematic subjective questions
CN116227603A (en) Event reasoning task processing method, device and medium
CN115774782A (en) Multilingual text classification method, device, equipment and medium
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
CN111242114B (en) Character recognition method and device
CN112328812A (en) Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination