CN111291672B

CN111291672B - Combined image text recognition and fuzzy judgment method, device and storage medium

Info

Publication number: CN111291672B
Application number: CN202010077341.4A
Authority: CN
Inventors: 牟永强; 范宝杰; 谭磊; 林凌帆; 黄耀鸿; 王芹
Original assignee: Imagedt Co ltd
Current assignee: Imagedt Co ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2023-05-12
Anticipated expiration: 2040-01-22
Also published as: CN111291672A

Abstract

The invention discloses a method, a device and a storage medium for identifying and judging fuzzy of a combined image text. According to the combined image text recognition and fuzzy judgment method, a text image to be detected is input into a target model by acquiring the target model, a convolutional neural network shared by an image sequence recognition network and an image fuzzy judgment network is utilized, and high-dimensional characteristic images output by the convolutional neural network are respectively input into the image sequence recognition network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time, and the image text recognition and the image fuzzy judgment can be processed in parallel. The invention can utilize the target model to realize the parallel processing of the image text recognition and the image blurring judgment, thereby further improving the recognition precision of the text image.

Description

Combined image text recognition and fuzzy judgment method, device and storage medium

Technical Field

The present invention relates to the field of text image processing technologies, and in particular, to a method, an apparatus, and a storage medium for identifying and determining text of a combined image.

Background

The text information in the text image is regarded as a relatively high-level semantic content in the visual information, and is important for understanding and acquiring the visual content. When the existing image text recognition technology recognizes text information of a text image, the existing image text recognition technology is often influenced by the quality of the text image, and an image blurring judgment technology is often required to be applied to preprocess the text image so as to filter out the text image with low quality. However, since the image text recognition technology and the image blurring judgment technology are both used for processing text images independently, not only is resource waste easily caused, but also feature information of related tasks cannot be obtained mutually, and further improvement of recognition accuracy of text images is difficult.

Disclosure of Invention

The invention provides a combined image text recognition and fuzzy judgment method, a device and a storage medium, which are used for overcoming the defects of the prior art.

In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a method for identifying and determining blur of a combined image text, including:

constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;

acquiring a text image set, real text information and real fuzzy probability corresponding to the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;

inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;

inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;

calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;

reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;

and inputting the text image to be detected into the target model to obtain the target fuzzy probability and target text information.

Further, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.

Further, before the acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, the method further comprises:

collecting text images, and labeling each text image with the real text information and the real fuzzy probability;

and dividing the marked text image into the text image set.

Further, after the acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, before the inputting the text image set into the convolutional neural network, the method further comprises:

preprocessing the text image set; wherein the preprocessing includes data enhancement and data normalization.

Further, the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set, and the method comprises the following steps:

slicing the high-dimensional characteristic image set to obtain an input sequence;

inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;

and inputting the characteristic sequence into a decoding network provided with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.

Further, the image blur determination network outputs a predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, including:

performing dimension reduction processing on the high-dimensional characteristic image set to obtain a low-dimensional characteristic image set, and correspondingly stretching the low-dimensional characteristic image set into an input vector;

inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;

the target vector is converted to the prediction blur probability by a softmax function.

Further, the two-class network is composed of three fully connected layers.

In a second aspect, an embodiment of the present invention provides a joint image text recognition and blur determination apparatus, including:

the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network;

the convolutional neural network training module is used for acquiring a text image set and real text information and real fuzzy probability corresponding to the text image set, inputting the text image set into the convolutional neural network, and enabling the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;

the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;

the image blurring judgment network training module is used for inputting the high-dimensional characteristic image set into the image blurring judgment network, so that the image blurring judgment network outputs the prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set;

the network error calculation module is used for calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;

the target model acquisition module is used for reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain a target model;

and the text image detection module to be detected is used for inputting the text image to be detected into the target model to obtain the target fuzzy probability and the target text information.

In a third aspect, an embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform a method for joint image text recognition and blur determination as described above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the method comprises the steps of inputting a text image to be detected into a target model by acquiring the target model, utilizing a convolutional neural network shared by an image sequence recognition network and an image blurring judgment network, and respectively inputting high-dimensional characteristic images output by the convolutional neural network into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time so as to process image text recognition and image blurring judgment in parallel. The invention can utilize the target model to realize the parallel processing of the image text recognition and the image blurring judgment, thereby further improving the recognition precision of the text image.

Drawings

FIG. 1 is a flowchart of a combined image text recognition and blur determination method according to a first embodiment of the present invention;

FIG. 2 is a network configuration diagram of an initial model in a first embodiment of the present invention;

FIG. 3 is a flow chart of a preferred embodiment of the first embodiment of the present invention;

FIG. 4 is a schematic flow chart of another preferred embodiment of the first embodiment of the present invention;

fig. 5 is a network configuration diagram of an image blur determination network in the first embodiment of the present invention;

fig. 6 is a schematic structural diagram of a combined image text recognition and blur determination apparatus according to a second embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps. The method provided in this embodiment may be executed by a relevant server, and the following description will take the server as an execution body as an example.

Please refer to fig. 1-5.

As shown in fig. 1, a first embodiment provides a joint image text recognition and blur determination method, including steps S1 to S7:

s1, constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network.

S2, acquiring a text image set and real text information and real fuzzy probability corresponding to the text image set, inputting the text image set into a convolutional neural network, and enabling the convolutional neural network to output a high-dimensional characteristic image set according to the text image set.

S3, inputting the high-dimensional characteristic image set into an image sequence recognition network, and enabling the image sequence recognition network to output predictive text information corresponding to the text image set according to the high-dimensional characteristic image set.

S4, inputting the high-dimensional characteristic image set into an image blurring judging network, enabling the image blurring judging network to output prediction blurring probability corresponding to the text image set according to the high-dimensional characteristic image set.

S5, calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image blurring judgment network according to the real blurring probability and the predicted blurring probability.

S6, reversely inputting the identification error and the judgment error into the convolutional neural network, updating parameters of the convolutional neural network, and ending training the initial model when the convolutional neural network converges to obtain the target model.

S7, inputting the text image to be detected into a target model to obtain target text information and target fuzzy probability.

The recognition error is a relative error between the real text information and the predicted text information, and the judgment error is a relative error between the real blur probability and the predicted blur probability.

In a preferred implementation of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.

In step S1, by constructing an initial model, the convolutional neural network is used as a shared network between the image sequence recognition network and the image blur determination network, so that the image sequence recognition network and the image blur determination network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network, so as to process image text recognition and image blur determination in parallel. Wherein the network structure diagram of the initial model is shown in fig. 2.

In step S2, the convolutional neural network is enabled to output a high-dimensional feature image set according to the text image set by inputting the text image set into the convolutional neural network, so as to train the convolutional neural network, and the convolutional neural network inputs the high-dimensional feature image set into the image sequence recognition network and the image blur judgment network.

In step S3, the high-dimensional feature image set is input into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, thereby realizing the training image sequence recognition network and being beneficial to improving the text recognition precision of the image sequence recognition network.

In step S4, the high-dimensional feature image set is input into the image blur determination network, so that the image blur determination network outputs the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, thereby realizing the training image blur determination network and being beneficial to improving the blur determination precision of the image blur determination network.

In step S5, the recognition error of the image sequence recognition network is calculated according to the real text information and the predicted text information, and the judgment error of the image fuzzy judgment network is calculated according to the real fuzzy probability and the predicted fuzzy probability, so that the initial model is optimized to obtain the target model, and the prediction accuracy of the target model is further improved.

In step S6, the recognition error and the judgment error are reversely input into the convolutional neural network, parameters of the convolutional neural network are updated, and training of the initial model is finished when the convolutional neural network converges, so that the target model is obtained, and the image sequence recognition network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through learning respective tasks, which is beneficial to improving the prediction accuracy of the target model.

In step S7, the text image to be detected is input into the target model to obtain target text information and target fuzzy probability, so that the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, thereby further improving recognition accuracy of the text image.

According to the embodiment, firstly, an acquired text image set is input into a convolutional neural network, the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then the high-dimensional characteristic image set is respectively input into an image sequence recognition network and an image fuzzy judgment network, the image sequence recognition network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then recognition errors of the image sequence recognition network and judgment errors of the image fuzzy judgment network are respectively calculated according to real text information, the predicted text information, the real fuzzy probability and the predicted fuzzy probability, so that the recognition errors and the judgment errors are reversely input into the convolutional neural network to update parameters of the convolutional neural network, training initial models are ended when the convolutional neural network converges, a target model is obtained, and finally a text to be detected is input into the target model, and the target text information and the target fuzzy probability are obtained.

According to the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence recognition network and the image blurring judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can obtain the high-dimensional characteristic image at the same time, and the image text recognition and the image blurring judgment can be processed in parallel.

According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image blurring judgment, so that the recognition accuracy of the text image is further improved.

In a preferred embodiment, step S2 further includes, before acquiring the text image set and the real text information and the real blur probability corresponding to the text image set: collecting text images, and labeling real text information and real fuzzy probability for each text image; and dividing the marked text image into the text image set.

In a preferred implementation manner of this embodiment, by counting the number of votes for the text information and the blur probability of each text image, the text information with the highest vote is taken as the corresponding real text information, and the blur probability with the highest vote is taken as the corresponding real blur probability.

According to the method, before the text image set is acquired, the real text information and the real fuzzy probability are marked on each text image in the text image set, so that the identification error of the image sequence identification network and the judgment error of the image fuzzy judgment network are calculated according to the real text information and the real fuzzy probability, and the initial model is optimized to obtain the target model.

In a preferred embodiment, step S2, after acquiring the text image set and the real text information and the real blur probability corresponding to the text image set, further includes, before inputting the text image set into the convolutional neural network: preprocessing a text image set; wherein the preprocessing includes data enhancement and data normalization.

Before the text image set is input into the convolutional neural network, the text image set is preprocessed through data enhancement, data normalization and the like, so that the recognition accuracy of the text image is further improved.

As shown in fig. 3, in a preferred embodiment, step S3 includes steps S31 to S33:

s31, slicing the high-dimensional characteristic image set to obtain an input sequence.

S32, inputting the input sequence into the LSTM network, so that the LSTM network outputs the characteristic sequence according to the input sequence.

S33, inputting the feature sequence into a decoding network provided with an attention mechanism, so that the decoding network outputs predicted text information according to the feature sequence.

Take a high-dimensional feature map as an example.

After the convolutional neural network outputs a 12 x 25 x 512 high-dimensional feature map according to an input text image, slicing the high-dimensional feature map along the width direction of the high-dimensional feature map, inputting the obtained 25 x 6144 sequence as an input sequence into a bidirectional LSTM network (namely a circulating layer), processing the input sequence by the bidirectional LSTM network, outputting a 25 x 512 feature sequence, and inputting the feature sequence into a decoding network provided with an attention mechanism.

The basic design idea of the attention mechanism is to selectively learn the input sequences by preserving the intermediate output results of the LSTM encoder on the input sequences and then training a model, and associate the output sequences with the input sequences when the model is output, specifically implemented as follows:

input: c= { c ₁ ,c ₂ ,…,c _i ,…,c _L }，L＝25 (1)

In the formula (1), c _i Representing a certain spatial location feature calculated by the LSTM network.

The process comprises the following steps:

contextual attention parameter e: e, e _i ＝f _ATT (h,c _i ) (2)

The weight parameter a is obtained through softmax function normalization:

the resulting feature after using the attention mechanism can be represented as c ^t ：

In equations (2) - (4), function f _ATT And h represents a hidden state parameter of the multi-layer network.

And (3) outputting: c ^t 。

Wherein the image sequence identifies a loss function L of the network:

in the formula (5), ->

M represents the maximum length of the output sequence, N represents the number of samples involved in training, K represents the number of categories of classification, < ->

b _i,j Representing network parameters, x is a feature vector of the network, s _i,j The softmax output representing the jth training sample, the ith position.

As shown in fig. 4, in another preferred embodiment, step S4 includes steps S41 to S43:

s41, performing dimension reduction processing on the high-dimensional characteristic image set to obtain a low-dimensional characteristic image set, and correspondingly stretching the low-dimensional characteristic image set into an input vector.

S42, inputting the input vector into a two-class network, so that the two-class network outputs a target vector according to the input vector.

S43, converting the target vector into a prediction fuzzy probability through a softmax function.

In a preferred implementation of this embodiment, the two-class network consists of three fully connected layers.

Take a high-dimensional feature map as an example. The network structure diagram of the image blurring determination network is shown in fig. 5.

After the convolutional neural network outputs a 12 x 25 x 512 high-dimensional feature map according to the input text image, inputting the high-dimensional feature map into a 1*1 convolutional layer, and performing dimension reduction on the high-dimensional feature image set by the 1*1 convolutional layer, namely, processing the 12 x 25 x 512 high-dimensional feature map into a 12 x 25 x 256 feature map, and stretching the 12 x 25 x 256 feature map into a 1 x 76800 vector serving as an input vector.

And inputting the input vector into a two-class network consisting of three full-connection layers, wherein the back dimension of the first full-connection layer is 1 x 768, the back dimension of the second full-connection layer is 1 x 128, and the back dimension of the third full-connection layer is 1, so that the target vector is output. Wherein, if the target vector is 0, the clear image is represented, and if the target vector is 1, the blurred image is represented.

The target vector is converted into a probability by a softmax function, and the output probability is taken as a prediction fuzzy probability. The greater the predicted blur probability, the greater the likelihood that the corresponding text image is a blurred image.

And then feeding back a training image fuzzy judgment network through a cross entropy function.

Wherein, the loss function L of the image blurring judgment network _blur ：

L _blur ＝-(y*log(y _p )+(1-y))log(1-y _p ) (6)

In formula (6), y _p Representing the predicted blur probability, y representing the true blur probability.

Loss function loss of initial model: loss=l+l _blur (7)

In the formula (7), L represents a loss function of the image sequence recognition network, L _blur Representing the loss function of the image blur determination network.

And the image sequence recognition network and the image fuzzy judgment network respectively input recognition errors and judgment errors into the convolutional neural network in a reverse mode, update parameters of the convolutional neural network, finish training an initial model when the convolutional neural network converges, and take the derived optimal model as a target model.

Please refer to fig. 6.

As shown in fig. 6, a second embodiment provides a joint image text recognition and blur determination apparatus, including: an initial model construction module 21 for constructing an initial model; the initial model comprises a convolutional neural network, an image sequence recognition network and an image blurring judgment network; the convolutional neural network training module 22 is configured to obtain a text image set and real text information and real fuzzy probability corresponding to the text image set, input the text image set into the convolutional neural network, and enable the convolutional neural network to output a high-dimensional feature image set according to the text image set; the image sequence recognition network training module 23 is configured to input the high-dimensional feature image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set; the image blur determination network training module 24 is configured to input the high-dimensional feature image set into an image blur determination network, so that the image blur determination network outputs a prediction blur probability corresponding to the text image set according to the high-dimensional feature image set; a network error calculation module 25, configured to calculate an identification error of the image sequence identification network according to the real text information and the predicted text information, and calculate a judgment error of the image blur judgment network according to the real blur probability and the predicted blur probability; the target model obtaining module 26 is configured to reversely input the identification error and the judgment error into the convolutional neural network, update parameters of the convolutional neural network, and end training the initial model when the convolutional neural network converges to obtain a target model; the text image to be detected detection module 27 is configured to input the text image to be detected into the target model, so as to obtain the target fuzzy probability and the target text information.

The initial model is built through the initial model building module 21, and the convolutional neural network is used as a shared network between the image sequence recognition network and the image fuzzy judgment network, so that the image sequence recognition network and the image fuzzy judgment network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network to process image text recognition and image fuzzy judgment in parallel.

The text image set is input into the convolutional neural network through the convolutional neural network training module 22, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, and the convolutional neural network is trained to input the high-dimensional characteristic image set into the image sequence recognition network and the image fuzzy judgment network through the convolutional neural network.

The image sequence recognition network training module 23 inputs the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set, the training image sequence recognition network is realized, and the text recognition precision of the image sequence recognition network is improved.

The image blur determination network training module 24 inputs the high-dimensional characteristic image set into the image blur determination network, so that the image blur determination network outputs the prediction blur probability corresponding to the text image set according to the high-dimensional characteristic image set, thereby realizing the training of the image blur determination network and being beneficial to improving the blur determination precision of the image blur determination network.

The network error calculation module 25 calculates the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculates the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability so as to optimize the initial model to obtain the target model, thereby further improving the prediction precision of the target model.

The target model acquisition module 26 is used for reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network converges to obtain the target model, so that the image sequence identification network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through learning respective tasks, and the prediction accuracy of the target model is improved.

The text image to be detected is input into the target model through the text image detection module 27 to obtain target text information and target fuzzy probability, and the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.

After the initial model is built by the initial model building module 21, firstly, the acquired text image set is input into the convolutional neural network through the convolutional neural network training module 22, the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then, the high-dimensional characteristic image set is respectively input into the image sequence recognition network and the image fuzzy judgment network through the image sequence recognition network training module 23 and the image fuzzy judgment network training module 24, the image sequence recognition network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then, the network error calculation module 25 respectively calculates recognition errors of the image sequence recognition network and judgment errors of the image fuzzy judgment network according to the real text information and the predicted text information, the real fuzzy probability and the predicted fuzzy probability, so that the recognition errors and the judgment errors are reversely input into the convolutional neural network through the target model acquisition module 26 to update parameters of the convolutional neural network, the training initial model is finished when the convolutional neural network converges, the target model is obtained, and finally, the text to be detected is input into the target model through the text image detection module 27, and the target text information to be detected and the target fuzzy probability are obtained.

A third embodiment provides a computer-readable storage medium, which includes a stored computer program, where the computer-readable storage medium is controlled to execute the joint image text recognition and blur determination method described in the first embodiment when the computer program runs, and the same advantages as the joint image text recognition and blur determination method can be achieved.

In summary, the embodiment of the invention has the following beneficial effects:

the method comprises the steps of inputting a text image to be detected into a target model by acquiring the target model, utilizing a convolutional neural network shared by an image sequence recognition network and an image blurring judgment network, and respectively inputting high-dimensional characteristic images output by the convolutional neural network into the image sequence recognition network and the image blurring judgment network, so that the image blurring judgment network and the image sequence recognition network can acquire the high-dimensional characteristic images at the same time so as to process image text recognition and image blurring judgment in parallel. According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image blurring judgment, so that the recognition accuracy of the text image is further improved.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiments may be accomplished by way of computer programs, which may be stored on a computer readable storage medium, which when executed may comprise the steps of the above-described embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Claims

1. A combined image text recognition and blurring judgment method is characterized by comprising the following steps:

2. The joint image text recognition and blur determination method of claim 1 wherein the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.

3. The joint image text recognition and blur determination method according to claim 1, further comprising, before the acquiring of the text image set and the real text information and the real blur probability corresponding to the text image set:

and dividing the marked text image into the text image set.

4. The joint image text recognition and blur determination method according to claim 1, characterized by further comprising, after the acquiring of the text image set and the real text information corresponding to the text image set, the real blur probability, before the inputting of the text image set into the convolutional neural network:

5. The joint image text recognition and blur determination method according to claim 1, wherein the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set, comprising:

6. The joint image text recognition and blur determination method of claim 1 wherein the image blur determination network outputs a predicted blur probability corresponding to the text image set based on the high-dimensional feature image set, comprising:

7. The joint image text recognition and blur determination method of claim 6 wherein the classification network consists of three fully connected layers.

8. A joint image text recognition and blur determination apparatus, comprising:

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to perform the joint image text recognition and blur determination method according to any one of claims 1 to 7.