CN111291672B - Combined image text recognition and fuzzy judgment method, device and storage medium - Google Patents

Combined image text recognition and fuzzy judgment method, device and storage medium Download PDF

Info

Publication number
CN111291672B
CN111291672B CN202010077341.4A CN202010077341A CN111291672B CN 111291672 B CN111291672 B CN 111291672B CN 202010077341 A CN202010077341 A CN 202010077341A CN 111291672 B CN111291672 B CN 111291672B
Authority
CN
China
Prior art keywords
image
network
text
image set
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010077341.4A
Other languages
Chinese (zh)
Other versions
CN111291672A (en
Inventor
牟永强
范宝杰
谭磊
林凌帆
黄耀鸿
王芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imagedt Co ltd
Original Assignee
Imagedt Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imagedt Co ltd filed Critical Imagedt Co ltd
Priority to CN202010077341.4A priority Critical patent/CN111291672B/en
Publication of CN111291672A publication Critical patent/CN111291672A/en
Application granted granted Critical
Publication of CN111291672B publication Critical patent/CN111291672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device and a storage medium for identifying and judging fuzzy of a combined image text. According to the combined image text recognition and fuzzy judgment method, a text image to be detected is input into a target model by acquiring the target model, a convolutional neural network shared by an image sequence recognition network and an image fuzzy judgment network is utilized, and high-dimensional characteristic images output by the convolutional neural network are respectively input into the image sequence recognition network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time, and the image text recognition and the image fuzzy judgment can be processed in parallel. The invention can utilize the target model to realize the parallel processing of the image text recognition and the image blurring judgment, thereby further improving the recognition precision of the text image.

Description

Combined image text recognition and fuzzy judgment method, device and storage medium
Technical Field
The present invention relates to the field of text image processing technologies, and in particular, to a method, an apparatus, and a storage medium for identifying and determining text of a combined image.
Background
The text information in the text image is regarded as a relatively high-level semantic content in the visual information, and is important for understanding and acquiring the visual content. When the existing image text recognition technology recognizes text information of a text image, the existing image text recognition technology is often influenced by the quality of the text image, and an image blurring judgment technology is often required to be applied to preprocess the text image so as to filter out the text image with low quality. However, since the image text recognition technology and the image blurring judgment technology are both used for processing text images independently, not only is resource waste easily caused, but also feature information of related tasks cannot be obtained mutually, and further improvement of recognition accuracy of text images is difficult.
Disclosure of Invention
The invention provides a combined image text recognition and fuzzy judgment method, a device and a storage medium, which are used for overcoming the defects of the prior art.
In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a method for identifying and determining blur of a combined image text, including:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;
acquiring a text image set, real text information and real fuzzy probability corresponding to the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;
calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;
and inputting the text image to be detected into the target model to obtain the target fuzzy probability and target text information.
Further, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
Further, before the acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, the method further comprises:
collecting text images, and labeling each text image with the real text information and the real fuzzy probability;
and dividing the marked text image into the text image set.
Further, after the acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, before the inputting the text image set into the convolutional neural network, the method further comprises:
preprocessing the text image set; wherein the preprocessing includes data enhancement and data normalization.
Further, the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set, and the method comprises the following steps:
slicing the high-dimensional characteristic image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network provided with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
Further, the image blur determination network outputs a predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, including:
performing dimension reduction processing on the high-dimensional characteristic image set to obtain a low-dimensional characteristic image set, and correspondingly stretching the low-dimensional characteristic image set into an input vector;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
the target vector is converted to the prediction blur probability by a softmax function.
Further, the two-class network is composed of three fully connected layers.
In a second aspect, an embodiment of the present invention provides a joint image text recognition and blur determination apparatus, including:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;
the convolutional neural network training module is used for acquiring a text image set and real text information and real fuzzy probability corresponding to the text image set, inputting the text image set into the convolutional neural network, and enabling the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image blurring judgment network training module is used for inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
the target model acquisition module is used for reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;
and the text image detection module to be detected is used for inputting the text image to be detected into the target model to obtain the target fuzzy probability and the target text information.
In a third aspect, an embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform a method for joint image text recognition and blur determination as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the method comprises the steps of inputting a text image to be detected into a target model by acquiring the target model, utilizing a convolutional neural network shared by an image sequence recognition network and an image blurring judgment network, and respectively inputting high-dimensional characteristic images output by the convolutional neural network into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time so as to process image text recognition and image blurring judgment in parallel. The invention can utilize the target model to realize the parallel processing of the image text recognition and the image blurring judgment, thereby further improving the recognition precision of the text image.
Drawings
FIG. 1 is a flowchart of a combined image text recognition and blur determination method according to a first embodiment of the present invention;
FIG. 2 is a network configuration diagram of an initial model in a first embodiment of the present invention;
FIG. 3 is a flow chart of a preferred embodiment of the first embodiment of the present invention;
FIG. 4 is a schematic flow chart of another preferred embodiment of the first embodiment of the present invention;
fig. 5 is a network configuration diagram of an image blur determination network in the first embodiment of the present invention;
fig. 6 is a schematic structural diagram of a combined image text recognition and blur determination apparatus according to a second embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps. The method provided in this embodiment may be executed by a relevant server, and the following description will take the server as an execution body as an example.
Please refer to fig. 1-5.
As shown in fig. 1, a first embodiment provides a joint image text recognition and blur determination method, including steps S1 to S7:
s1, constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network.
S2, acquiring a text image set and real text information and real fuzzy probability corresponding to the text image set, inputting the text image set into a convolutional neural network, and enabling the convolutional neural network to output a high-dimensional characteristic image set according to the text image set.
S3, inputting the high-dimensional characteristic image set into an image sequence recognition network, and enabling the image sequence recognition network to output predictive text information corresponding to the text image set according to the high-dimensional characteristic image set.
S4, inputting the high-dimensional characteristic image set into an image blurring judging network, enabling the image blurring judging network to output prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set.
S5, calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image blurring judgment network according to the real blurring probability and the predicted blurring probability.
S6, reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain the target model.
S7, inputting the text image to be detected into a target model to obtain target text information and target fuzzy probability.
The recognition error is a relative error between the real text information and the predicted text information, and the judgment error is a relative error between the real blur probability and the predicted blur probability.
In a preferred implementation of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
In step S1, by constructing an initial model, the convolutional neural network is used as a shared network between the image sequence recognition network and the image blur determination network, so that the image sequence recognition network and the image blur determination network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network, so as to process image text recognition and image blur determination in parallel. Wherein the network structure diagram of the initial model is shown in fig. 2.
In step S2, the convolutional neural network is enabled to output a high-dimensional feature image set according to the text image set by inputting the text image set into the convolutional neural network, so as to train the convolutional neural network, and the convolutional neural network inputs the high-dimensional feature image set into the image sequence recognition network and the image blur judgment network.
In step S3, the high-dimensional feature image set is input into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, thereby realizing the training image sequence recognition network and being beneficial to improving the text recognition precision of the image sequence recognition network.
In step S4, the high-dimensional feature image set is input into the image blur determination network, so that the image blur determination network outputs the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, thereby realizing the training image blur determination network and being beneficial to improving the blur determination precision of the image blur determination network.
In step S5, the recognition error of the image sequence recognition network is calculated according to the real text information and the predicted text information, and the judgment error of the image fuzzy judgment network is calculated according to the real fuzzy probability and the predicted fuzzy probability, so that the initial model is optimized to obtain the target model, and the prediction accuracy of the target model is further improved.
In step S6, the recognition error and the judgment error are reversely input into the convolutional neural network, parameters of the convolutional neural network are updated, and training of the initial model is finished when the convolutional neural network converges, so that the target model is obtained, and the image sequence recognition network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through learning respective tasks, which is beneficial to improving the prediction accuracy of the target model.
In step S7, the text image to be detected is input into the target model to obtain target text information and target fuzzy probability, so that the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, thereby further improving recognition accuracy of the text image.
According to the embodiment, firstly, an acquired text image set is input into a convolutional neural network, the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then the high-dimensional characteristic image set is respectively input into an image sequence recognition network and an image fuzzy judgment network, the image sequence recognition network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then recognition errors of the image sequence recognition network and judgment errors of the image fuzzy judgment network are respectively calculated according to real text information, the predicted text information, the real fuzzy probability and the predicted fuzzy probability, so that the recognition errors and the judgment errors are reversely input into the convolutional neural network to update parameters of the convolutional neural network, training initial models are ended when the convolutional neural network converges, a target model is obtained, and finally a text to be detected is input into the target model, and the target text information and the target fuzzy probability are obtained.
According to the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence recognition network and the image blurring judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can obtain the high-dimensional characteristic image at the same time, and the image text recognition and the image blurring judgment can be processed in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image blurring judgment, so that the recognition accuracy of the text image is further improved.
In a preferred embodiment, step S2 further includes, before acquiring the text image set and the real text information and the real blur probability corresponding to the text image set: collecting text images, and labeling real text information and real fuzzy probability for each text image; and dividing the marked text image into the text image set.
In a preferred implementation manner of this embodiment, by counting the number of votes for the text information and the blur probability of each text image, the text information with the highest vote is taken as the corresponding real text information, and the blur probability with the highest vote is taken as the corresponding real blur probability.
According to the method, before the text image set is acquired, the real text information and the real fuzzy probability are marked on each text image in the text image set, so that the identification error of the image sequence identification network and the judgment error of the image fuzzy judgment network are calculated according to the real text information and the real fuzzy probability, and the initial model is optimized to obtain the target model.
In a preferred embodiment, step S2, after acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, further includes, before inputting the text image set into the convolutional neural network: preprocessing a text image set; wherein the preprocessing includes data enhancement and data normalization.
Before the text image set is input into the convolutional neural network, the text image set is preprocessed through data enhancement, data normalization and the like, so that the recognition accuracy of the text image is further improved.
As shown in fig. 3, in a preferred embodiment, step S3 includes steps S31 to S33:
s31, slicing the high-dimensional characteristic image set to obtain an input sequence.
S32, inputting the input sequence into the LSTM network, so that the LSTM network outputs the characteristic sequence according to the input sequence.
S33, inputting the feature sequence into a decoding network provided with an attention mechanism, so that the decoding network outputs predicted text information according to the feature sequence.
Take a high-dimensional feature map as an example.
After the convolutional neural network outputs a 12 x 25 x 512 high-dimensional feature map according to an input text image, slicing the high-dimensional feature map along the width direction of the high-dimensional feature map, inputting the obtained 25 x 6144 sequence as an input sequence into a bidirectional LSTM network (namely a circulating layer), processing the input sequence by the bidirectional LSTM network, outputting a 25 x 512 feature sequence, and inputting the feature sequence into a decoding network provided with an attention mechanism.
The basic design idea of the attention mechanism is to selectively learn the input sequences by preserving the intermediate output results of the LSTM encoder on the input sequences and then training a model, and associate the output sequences with the input sequences when the model is output, specifically implemented as follows:
input: c= { c 1 ,c 2 ,…,c i ,…,c L },L=25 (1)
In the formula (1), c i Representing a certain spatial location feature calculated by the LSTM network.
The process comprises the following steps:
contextual attention parameter e: e, e i =f ATT (h,c i ) (2)
The weight parameter a is obtained through softmax function normalization:
Figure BDA0002378318990000081
the resulting feature after using the attention mechanism can be represented as c t
Figure BDA0002378318990000082
In equations (2) - (4), function f ATT And h represents a hidden state parameter of the multi-layer network.
And (3) outputting: c t
Wherein the image sequence identifies a loss function L of the network:
Figure BDA0002378318990000091
in the formula (5), ->
Figure BDA0002378318990000092
M represents the maximum length of the output sequence, N represents the number of samples involved in training, K represents the number of categories of classification, < ->
Figure BDA0002378318990000093
b i,j Representing network parameters, x is a feature vector of the network, s i,j The softmax output representing the jth training sample, the ith position.
As shown in fig. 4, in another preferred embodiment, step S4 includes steps S41 to S43:
s41, performing dimension reduction processing on the high-dimensional characteristic image set to obtain a low-dimensional characteristic image set, and correspondingly stretching the low-dimensional characteristic image set into an input vector.
S42, inputting the input vector into a two-class network, so that the two-class network outputs a target vector according to the input vector.
S43, converting the target vector into a prediction fuzzy probability through a softmax function.
In a preferred implementation of this embodiment, the two-class network consists of three fully connected layers.
Take a high-dimensional feature map as an example. The network structure diagram of the image blurring determination network is shown in fig. 5.
After the convolutional neural network outputs a 12 x 25 x 512 high-dimensional feature map according to the input text image, inputting the high-dimensional feature map into a 1*1 convolutional layer, and performing dimension reduction on the high-dimensional feature image set by the 1*1 convolutional layer, namely, processing the 12 x 25 x 512 high-dimensional feature map into a 12 x 25 x 256 feature map, and stretching the 12 x 25 x 256 feature map into a 1 x 76800 vector serving as an input vector.
And inputting the input vector into a two-class network consisting of three full-connection layers, wherein the back dimension of the first full-connection layer is 1 x 768, the back dimension of the second full-connection layer is 1 x 128, and the back dimension of the third full-connection layer is 1, so that the target vector is output. Wherein, if the target vector is 0, the clear image is represented, and if the target vector is 1, the blurred image is represented.
The target vector is converted into a probability by a softmax function, and the output probability is taken as a prediction fuzzy probability. The greater the predicted blur probability, the greater the likelihood that the corresponding text image is a blurred image.
And then feeding back a training image fuzzy judgment network through a cross entropy function.
Wherein, the loss function L of the image blurring judgment network blur
L blur =-(y*log(y p )+(1-y))log(1-y p ) (6)
In formula (6), y p Representing the predicted blur probability, y representing the true blur probability.
Loss function loss of initial model: loss=l+l blur (7)
In the formula (7), L represents a loss function of the image sequence recognition network, L blur Representing the loss function of the image blur determination network.
And the image sequence recognition network and the image fuzzy judgment network respectively input recognition errors and judgment errors into the convolutional neural network in a reverse mode, update parameters of the convolutional neural network, finish training an initial model when the convolutional neural network converges, and take the derived optimal model as a target model.
Please refer to fig. 6.
As shown in fig. 6, a second embodiment provides a joint image text recognition and blur determination apparatus, including: an initial model construction module 21 for constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network; the convolutional neural network training module 22 is configured to obtain a text image set and real text information and real fuzzy probability corresponding to the text image set, input the text image set into the convolutional neural network, and enable the convolutional neural network to output a high-dimensional feature image set according to the text image set; the image sequence recognition network training module 23 is configured to input the high-dimensional feature image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set; the image blur determination network training module 24 is configured to input the high-dimensional feature image set into an image blur determination network, so that the image blur determination network outputs a prediction blur probability corresponding to the text image set according to the high-dimensional feature image set; a network error calculation module 25, configured to calculate an identification error of the image sequence identification network according to the real text information and the predicted text information, and calculate a judgment error of the image blur judgment network according to the real blur probability and the predicted blur probability; the target model obtaining module 26 is configured to reversely input the identification error and the judgment error into the convolutional neural network, update parameters of the convolutional neural network, and end training the initial model when the convolutional neural network converges to obtain a target model; the text image to be detected detection module 27 is configured to input the text image to be detected into the target model, so as to obtain the target fuzzy probability and the target text information.
The recognition error is a relative error between the real text information and the predicted text information, and the judgment error is a relative error between the real blur probability and the predicted blur probability.
In a preferred implementation of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
The initial model is built through the initial model building module 21, and the convolutional neural network is used as a shared network between the image sequence recognition network and the image fuzzy judgment network, so that the image sequence recognition network and the image fuzzy judgment network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network to process image text recognition and image fuzzy judgment in parallel.
The text image set is input into the convolutional neural network through the convolutional neural network training module 22, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, and the convolutional neural network is trained to input the high-dimensional characteristic image set into the image sequence recognition network and the image fuzzy judgment network through the convolutional neural network.
The image sequence recognition network training module 23 inputs the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set, the training image sequence recognition network is realized, and the text recognition precision of the image sequence recognition network is improved.
The image blur determination network training module 24 inputs the high-dimensional characteristic image set into the image blur determination network, so that the image blur determination network outputs the prediction blur probability corresponding to the text image set according to the high-dimensional characteristic image set, thereby realizing the training of the image blur determination network and being beneficial to improving the blur determination precision of the image blur determination network.
The network error calculation module 25 calculates the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculates the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability so as to optimize the initial model to obtain the target model, thereby further improving the prediction precision of the target model.
The target model acquisition module 26 is used for reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network converges to obtain the target model, so that the image sequence identification network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through learning respective tasks, and the prediction accuracy of the target model is improved.
The text image to be detected is input into the target model through the text image detection module 27 to obtain target text information and target fuzzy probability, and the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
After the initial model is built by the initial model building module 21, firstly, the acquired text image set is input into the convolutional neural network through the convolutional neural network training module 22, the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then, the high-dimensional characteristic image set is respectively input into the image sequence recognition network and the image fuzzy judgment network through the image sequence recognition network training module 23 and the image fuzzy judgment network training module 24, the image sequence recognition network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then, the network error calculation module 25 respectively calculates recognition errors of the image sequence recognition network and judgment errors of the image fuzzy judgment network according to the real text information and the predicted text information, the real fuzzy probability and the predicted fuzzy probability, so that the recognition errors and the judgment errors are reversely input into the convolutional neural network through the target model acquisition module 26 to update parameters of the convolutional neural network, the training initial model is finished when the convolutional neural network converges, the target model is obtained, and finally, the text to be detected is input into the target model through the text image detection module 27, and the target text information to be detected and the target fuzzy probability are obtained.
According to the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence recognition network and the image blurring judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can obtain the high-dimensional characteristic image at the same time, and the image text recognition and the image blurring judgment can be processed in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image blurring judgment, so that the recognition accuracy of the text image is further improved.
A third embodiment provides a computer-readable storage medium, which includes a stored computer program, where the computer-readable storage medium is controlled to execute the joint image text recognition and blur determination method described in the first embodiment when the computer program runs, and the same advantages as the joint image text recognition and blur determination method can be achieved.
In summary, the embodiment of the invention has the following beneficial effects:
the method comprises the steps of inputting a text image to be detected into a target model by acquiring the target model, utilizing a convolutional neural network shared by an image sequence recognition network and an image blurring judgment network, and respectively inputting high-dimensional characteristic images output by the convolutional neural network into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time so as to process image text recognition and image blurring judgment in parallel. According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image blurring judgment, so that the recognition accuracy of the text image is further improved.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiments may be accomplished by way of computer programs, which may be stored on a computer readable storage medium, which when executed may comprise the steps of the above-described embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Claims (9)

1. A combined image text recognition and blurring judgment method is characterized by comprising the following steps:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;
acquiring a text image set, real text information and real fuzzy probability corresponding to the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;
calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;
and inputting the text image to be detected into the target model to obtain the target fuzzy probability and target text information.
2. The joint image text recognition and blur determination method of claim 1 wherein the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
3. The joint image text recognition and blur determination method according to claim 1, further comprising, before the acquiring of the text image set and the real text information and the real blur probability corresponding to the text image set:
collecting text images, and labeling each text image with the real text information and the real fuzzy probability;
and dividing the marked text image into the text image set.
4. The joint image text recognition and blur determination method according to claim 1, characterized by further comprising, after the acquiring of the text image set and the real text information corresponding to the text image set, the real blur probability, before the inputting of the text image set into the convolutional neural network:
preprocessing the text image set; wherein the preprocessing includes data enhancement and data normalization.
5. The joint image text recognition and blur determination method according to claim 1, wherein the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set, comprising:
slicing the high-dimensional characteristic image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network provided with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
6. The joint image text recognition and blur determination method of claim 1 wherein the image blur determination network outputs a predicted blur probability corresponding to the text image set based on the high-dimensional feature image set, comprising:
performing dimension reduction processing on the high-dimensional characteristic image set to obtain a low-dimensional characteristic image set, and correspondingly stretching the low-dimensional characteristic image set into an input vector;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
the target vector is converted to the prediction blur probability by a softmax function.
7. The joint image text recognition and blur determination method of claim 6 wherein the classification network consists of three fully connected layers.
8. A joint image text recognition and blur determination apparatus, comprising:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;
the convolutional neural network training module is used for acquiring a text image set and real text information and real fuzzy probability corresponding to the text image set, inputting the text image set into the convolutional neural network, and enabling the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image blurring judgment network training module is used for inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
the target model acquisition module is used for reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;
and the text image detection module to be detected is used for inputting the text image to be detected into the target model to obtain the target fuzzy probability and the target text information.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to perform the joint image text recognition and blur determination method according to any one of claims 1 to 7.
CN202010077341.4A 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium Active CN111291672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010077341.4A CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010077341.4A CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111291672A CN111291672A (en) 2020-06-16
CN111291672B true CN111291672B (en) 2023-05-12

Family

ID=71021436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010077341.4A Active CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111291672B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881881A (en) * 2020-08-10 2020-11-03 晶璞(上海)人工智能科技有限公司 Machine intelligent text recognition credibility judgment method based on multiple dimensions
CN113486858B (en) * 2021-08-03 2024-01-23 济南博观智能科技有限公司 Face recognition model training method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN110188819A (en) * 2019-05-29 2019-08-30 电子科技大学 A kind of CNN and LSTM image high-level semantic understanding method based on information gain
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN110188819A (en) * 2019-05-29 2019-08-30 电子科技大学 A kind of CNN and LSTM image high-level semantic understanding method based on information gain
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林硕蕾 ; .基于参数模糊判断的海量信息挖掘模型.科技通报.2015,(第05期),全文. *

Also Published As

Publication number Publication date
CN111291672A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN106960206B (en) Character recognition method and character recognition system
CN111291672B (en) Combined image text recognition and fuzzy judgment method, device and storage medium
CN111368671A (en) SAR image ship target detection and identification integrated method based on deep learning
CN110929802A (en) Information entropy-based subdivision identification model training and image identification method and device
CN114266977B (en) Multi-AUV underwater target identification method based on super-resolution selectable network
CN116861014B (en) Image information extraction method and device based on pre-training language model
CN115393678B (en) Multi-mode data fusion decision method based on image intermediate state
CN114283325A (en) Underwater target identification method based on knowledge distillation
CN116304984A (en) Multi-modal intention recognition method and system based on contrast learning
CN114067294B (en) Text feature fusion-based fine-grained vehicle identification system and method
CN116564355A (en) Multi-mode emotion recognition method, system, equipment and medium based on self-attention mechanism fusion
Ahammad et al. Recognizing Bengali sign language gestures for digits in real time using convolutional neural network
CN114694255A (en) Sentence-level lip language identification method based on channel attention and time convolution network
CN114202792A (en) Face dynamic expression recognition method based on end-to-end convolutional neural network
CN111291189B (en) Text processing method and device and computer readable storage medium
CN116051984B (en) Weak and small target detection method based on Transformer
CN111578956A (en) Visual SLAM positioning method based on deep learning
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
CN115952360A (en) Domain-adaptive cross-domain recommendation method and system based on user and article commonality modeling
CN114220145A (en) Face detection model generation method and device and fake face detection method and device
US11164035B2 (en) Neural-network-based optical character recognition using specialized confidence functions
WO2021148392A1 (en) Method and device for object identification on the basis of sensor data
Shashidhar et al. Enhancing Visual Speech Recognition for Deaf Individuals: A Hybrid LSTM and CNN 3D Model for Improved Accuracy
CN116012685B (en) Image description generation method based on fusion of relation sequence and visual sequence
CN117765482B (en) Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant