CN114187593B

CN114187593B - Image processing method and device

Info

Publication number: CN114187593B
Application number: CN202111526049.7A
Authority: CN
Inventors: 张家鑫; 黄灿
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2024-01-30
Anticipated expiration: 2041-12-14
Also published as: CN114187593A

Abstract

The application discloses an image processing method, comprising the following steps: acquiring an image to be processed; inputting the image to be processed into a character recognition model obtained by training in advance to obtain characters contained in the image to be processed; the character recognition model is used for: extracting image characteristics of the image to be processed, and obtaining characters included in the image to be processed according to the image characteristics. When the character recognition model is trained, first features of the training image can be extracted, features of corresponding characters in the first features are determined, then fuzzy processing is carried out on the features of the corresponding characters in the first features to obtain second features, a character prediction result is obtained according to the second features, and parameters of the character recognition model are updated based on the character prediction result and labels corresponding to the training image. Therefore, the character recognition model has the capability of predicting the real characters corresponding to the characteristics corresponding to the blurred characters, and the characters in the image to be processed can be accurately recognized by using the method.

Description

Image processing method and device

Technical Field

The present invention relates to the field of image processing, and in particular, to an image processing method and apparatus.

Background

In some scenarios, it is desirable to identify characters in an image. However, the current method for recognizing characters in an image cannot accurately recognize the characters in the image.

Therefore, a scheme is urgently needed that can accurately recognize characters in an image.

Disclosure of Invention

The technical problem to be solved by the application is how to accurately identify characters in an image, and an image processing method and an image processing device are provided.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring an image to be processed comprising characters;

inputting the image to be processed into the character recognition model to obtain characters contained in the image to be processed; wherein:

the character recognition model is used for: extracting image features of the image to be processed, and obtaining characters included in the image to be processed according to the image features; wherein:

the character recognition model is obtained through training in the following mode:

acquiring a training image and a label corresponding to the training image, wherein the label corresponding to the training image is used for indicating characters included in the training image;

training a character recognition model based on the training image and a label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image; wherein:

the training character recognition model based on the training image and the label corresponding to the training image comprises:

extracting a first feature of the training image;

determining the characteristics of corresponding characters in the first characteristics;

performing fuzzy processing on the characteristics of the corresponding characters in the first characteristics to obtain second characteristics;

obtaining a character prediction result according to the second characteristic;

and updating parameters of the character recognition model based on the character prediction result and the label corresponding to the training image.

Optionally, the blurring processing is performed on the features of the corresponding characters in the first features, including any one or more of the following:

removing part of the features in the features of the corresponding characters in the first features; or,

and modifying part of the characteristics of the corresponding characters in the first characteristics.

Optionally, the character recognition model includes a decoder and N encoders;

the first i encoders in the N encoders are connected in series, and are used for obtaining the first characteristic according to the training image, wherein the first characteristic is the output of the ith encoder, and i is a positive integer smaller than N;

the rear (N-i) encoders are connected in series and are used for processing the second characteristic to obtain a third characteristic;

and the decoder is used for obtaining the character prediction result according to the third characteristic.

Optionally, when the character recognition model is used to recognize characters in the image to be processed, the N encoders are used to extract image features of the image to be processed, and the decoder is used to obtain characters included in the image to be processed according to the image features.

Optionally, the determining the feature of the corresponding character in the first feature includes:

and determining the feature of the corresponding character in the first feature by using a feature extraction module, wherein the feature extraction module is used for determining the feature of the corresponding character in the first feature according to the first feature, and the feature extraction module is independent of the character recognition model.

Optionally, the feature extraction module is a time sequence classification algorithm CTC classification module.

Optionally, the blurring processing is performed on the feature of the corresponding character in the first feature to obtain a second feature, including:

and carrying out fuzzy processing on the characteristics of the corresponding characters in the first characteristics by utilizing a characteristic processing module independent of the character recognition model to obtain second characteristics.

Optionally, according to the second feature, obtaining a character prediction result includes:

and obtaining a character prediction result according to the second characteristic and the characteristic corresponding to the background noise in the first characteristic.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

an acquisition unit configured to acquire an image to be processed including characters;

the processing unit is used for inputting the image to be processed into the character recognition model to obtain characters contained in the image to be processed; wherein:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

Optionally, the character recognition model includes a decoder and N encoders;

In a third aspect, embodiments of the present application provide an apparatus comprising a processor and a memory;

the processor is configured to execute instructions stored in the memory to cause the apparatus to perform the method of any one of the first aspects above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium comprising instructions that instruct a device to perform the method of any one of the first aspects above.

In a fifth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the method of any one of the first aspects above.

Compared with the prior art, the embodiment of the application has the following advantages:

the embodiment of the application provides an image processing method, which comprises the following steps: acquiring an image to be processed; inputting the image to be processed into a character recognition model obtained by training in advance to obtain characters contained in the image to be processed; wherein: the character recognition model is used for: extracting image features of the image to be processed, obtaining characters included in the image to be processed according to the image features, and acquiring a training image and a label corresponding to the training image when training the character recognition model, wherein the label corresponding to the training image is used for indicating the characters included in the training image; and training a character recognition model based on the training image and the label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image. When the character recognition model is trained, first features of the training image can be extracted, features of corresponding characters in the first features are determined, then fuzzy processing is carried out on the features of the corresponding characters in the first features to obtain second features, a character prediction result is obtained according to the second features, and parameters of the character recognition model are updated based on the character prediction result and labels corresponding to the training image. Since the character prediction result is obtained by performing the blurring process on the features of the corresponding character in the first features to obtain the second features when training the character recognition model, the character recognition model has the capability of predicting the character corresponding to the feature corresponding to the character subjected to the blurring process, and therefore, even if the character itself in the image to be processed is unclear, the character can be accurately recognized by using the character recognition model. Namely: by using the image processing method provided by the embodiment of the application, the characters in the image to be processed can be accurately identified.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a character recognition model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The inventors of the present application have found through research that, at present, a machine learning model may be trained in advance to use the machine learning model, and an image may be recognized by using the machine learning model obtained by training, so as to determine characters included in the image.

In one example, the machine learning model may be a transducer model that includes an encoder (encoder) for encoding an image to obtain image features and a decoder (decoder) for decoding the features output by the encoder to obtain characters included in the image.

Even if the character in the image is recognized using the transducer model, there may be a problem in that recognition is inaccurate, for example, since the character itself in the image is unclear, the recognition result is inaccurate.

In order to solve the above problems, the embodiments of the present application provide a model training method and apparatus.

Various non-limiting embodiments of the present application are described in detail below with reference to the attached drawing figures.

Exemplary method

The embodiment of the application provides an image processing method, in which characters in an image to be processed can be identified by utilizing a character identification model which is obtained through training in advance, and the character identification model can accurately identify the characters in the image to be processed even if the characters in the image to be processed are unclear.

Next, a training process of the character recognition model will be described first.

Referring to fig. 1, the flow chart of a model training method provided in an embodiment of the present application is shown. In this embodiment, the method may be performed by a terminal or may be performed by a server, which is not specifically limited in this embodiment.

The method shown in fig. 1 may, for example, comprise the steps of: S101-S102.

It should be noted that, the model training process is a process of multiple iterative computations, each iteration can adjust parameters of the model, and the adjusted parameters participate in the next iteration computation.

Fig. 1 illustrates a training image as an example, and describes a round of iterative process in training a character recognition model. It will be appreciated that there are many sets of training images used to train the character recognition model, and that each set of training images is processed in a similar manner when training the formula recognition model. After training by a plurality of groups of training images, the character recognition model with the accuracy meeting the requirement can be obtained.

S101: acquiring a training image and a label corresponding to the training image, wherein the label corresponding to the training image is used for indicating characters included in the training image.

In one example, the training image may be an image including characters. The training image may be obtained through shooting by a shooting device, may be obtained from a network resource, or may be obtained through other modes, and the embodiment of the present application is not specifically limited.

In one example, an original image may be acquired and then processed to obtain a training image. The original image may be processed, for example, by changing the size of the original image, and in one example, the width and height of the original image may be scaled equally such that the height of the processed image is a preset height (e.g., 32).

In one example, the labels corresponding to the training images may be manually labeled.

S102: and training a character recognition model based on the training image and the label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image.

S102, in a specific implementation, may include the following S1021-S1025.

S1021: a first feature of the training image is extracted.

In one example, the character recognition model may include a feature extraction module to extract a first feature of the training image.

In one example, the feature extraction module may include i encoders, i being an integer greater than or equal to 1, and the first feature being an output of an i-th encoder. When i is greater than 1, the i encoders are connected in series, and the first encoder in the i encoders is used for processing the training image, and the output of the j encoder is the input of the (j+1) th encoder. Wherein (j+1) is less than or equal to i. The encoder referred to herein may be a native encoder of a conventional transducer model, and will not be described in detail herein.

S1022: and determining the characteristics of the corresponding characters in the first characteristics.

S1023: and carrying out fuzzy processing on the features of the corresponding characters in the first features to obtain second features.

S1024: and obtaining a character prediction result according to the second characteristic.

In the embodiment of the application, in order to enable the character recognition model obtained through training, the character can be recognized under the condition that the character included in the image to be processed is unclear. The method comprises the steps of identifying first features, identifying the features of corresponding characters in the first features, carrying out fuzzy processing on the features of the corresponding characters in the first features to obtain second features, and obtaining a character prediction result according to the second features. Since the character prediction result is obtained according to the second feature, the character recognition model actually has the capability of predicting the character corresponding to the feature before the feature is blurred according to the feature which has been blurred. In this way, the character recognition model obtained through training can accurately recognize the character even if the character in the image to be processed is unclear.

With respect to S1022, it should be noted that, in one example, the feature extraction module may be utilized to determine the feature of the corresponding character in the first feature. The feature extraction module is used for classifying the first features to determine the features of the corresponding characters in the first features. Wherein the feature extraction module may be a module independent of the character recognition model. The first feature is processed by means of the feature extraction module while training the character recognition model. After the training of the character recognition model is finished, the feature extraction module does not need to participate in calculation when the character recognition model is used for recognizing characters in the image to be processed.

In one example, considering a connection timing classification (Connectionist Temporal Classification, CTC) algorithm, it is possible to identify whether a feature is a feature of a corresponding character or a feature of a corresponding background noise. Thus, the feature extraction module may be a CTC module. In other words, S1022, when embodied, may utilize the CTC module to categorize the first feature to determine the feature of the corresponding character in the first feature.

In S1023, in a specific implementation, S1023 may perform blurring processing on a part of the first features, so as to obtain the second features. For example, 15% of the first features are blurred.

The blurring process is performed on a part of the first features, and there may be a plurality of implementations. In one example, some of the features of the corresponding character in the first feature may be removed. In yet another example, some of the features of the corresponding character in the first feature may be modified, e.g., features that would otherwise correspond to the first character are modified to features that correspond to other characters other than the first character.

Illustrating: for the 15% feature, a portion (e.g., 80%) of the 15% feature may be removed, another portion (e.g., 10%) of the 15% feature may be modified to correspond to another character, and the other portion (e.g., 10%) of the 15% feature may remain unchanged.

In one example, S1023 may be implemented by a feature processing module that is independent of the character recognition model. In other words, in a specific implementation, S1023 may use a feature processing module independent of the character recognition model to perform fuzzy processing on the feature of the corresponding character in the first feature, so as to obtain the second feature.

With respect to S1024, it should be noted that, in one example, the character recognition model may include a character recognition module, where the character recognition module is configured to obtain a character prediction result according to the second feature. In one example, the character recognition module may be a decoder, which may be a native decoder of a conventional transducer model, which is not described in detail herein.

The term "obtaining the character included in the image to be processed from the second feature" mentioned herein is understood to mean that the character prediction result is obtained by processing the third feature obtained by further processing the second feature.

In one example, the character recognition model may include another (N-i) encoders in addition to the i encoders described above that extract the first feature. In other words, the character recognition model may include N encoders. And when (N-i) is greater than 1, the back (N-i) encoders are connected in series, and the back (N-i) encoders are used for processing the second feature to obtain a third feature. It will be appreciated that for the (i+1) th encoder of the N encoders, its input is no longer the first characteristic of the i-th encoder output, but the second characteristic of the feature processing module output. In other words, in the training phase of the character recognition model, a feature extraction module and a feature processing module independent of the character recognition model are further included between the i-th encoder and the (i+1) -th encoder.

Here, although the input of the (i+1) th encoder is the second feature in the training phase of the character recognition model, the input of the (i+1) th encoder is the output of the i-th encoder in the application phase of the character recognition model.

S1025: and updating parameters of the character recognition model based on the character prediction result and the label corresponding to the training image.

Because the label corresponding to the training image is used for indicating the characters included in the training image, and the character prediction result is the characters in the training image recognized by the character recognition model, the parameters of the character recognition model can be updated based on the character prediction result and the label corresponding to the training image. In the subsequent training process, the character prediction result of the character recognition model after parameter adjustment can be more similar to the label corresponding to the training image.

In one example, in obtaining the character prediction result, a feature corresponding to background noise in the first feature may be considered in addition to the second feature. For this case, S1024, when embodied, may be: and obtaining a character prediction result according to the second characteristic and the characteristic corresponding to the background noise in the first characteristic. The "obtaining the character prediction result according to the second feature and the feature corresponding to the background noise in the first feature" may be understood as: and obtaining a character prediction result according to part or all of the second characteristic and the corresponding background noise in the first characteristic.

For this case, the latter (N-i) encoders of the character recognition model may process the second feature and the feature of the first feature corresponding to the background noise to obtain a fourth feature. Correspondingly, the decoder can obtain a character prediction result for the fourth feature.

In one example, the feature of the first feature corresponding to the background noise may also be determined by the feature extraction module.

Next, an image processing method provided in the embodiment of the present application will be described. Referring to fig. 2, the flow chart of an image processing method according to an embodiment of the present application is shown. The image processing method shown in fig. 2 may include the following S201 to S202.

S201: an image to be processed including characters is acquired.

The image to be processed may be obtained through shooting by shooting equipment, may be obtained from a network resource, and may also be obtained through other modes, and the embodiment of the present application is not particularly limited.

S202: inputting the image to be processed into the character recognition model to obtain characters contained in the image to be processed; wherein: the character recognition model is used for: extracting image features of the image to be processed, and obtaining characters included in the image to be processed according to the image features.

After the image to be processed is acquired, the image to be processed can be input into a character recognition model obtained through training, and the character recognition model can output characters included in the image to be processed.

The character recognition model referred to herein refers to a model trained using the method shown in fig. 1.

As is apparent from the above description of fig. 1, the character recognition model includes N encoders for extracting image features of an image to be processed when characters in the image to be processed are recognized using the character recognition model, and a decoder for obtaining the characters included in the image to be processed according to the image features.

With respect to the character recognition model, description will now be made with reference to fig. 3. Fig. 3 is a schematic structural diagram of a character recognition model according to an embodiment of the present application.

As shown in fig. 3, the character recognition model 300 includes N encoders and decoders 330, the N encoders including: i encoders 310 and (N-i) encoders 320. Wherein:

the encoder 310 and the encoder 320 have the same structure.

In the model training stage:

the output of the ith encoder is a first feature, and then the feature extraction module 400 and the feature processing module 500, which are independent of the character recognition model, process the first feature to obtain a second feature, and the second feature is used as the input of the (i+1) th encoder, and the (N-i) th encoder 320 obtains a third feature according to the second feature. The decoder 330 obtains a character prediction result according to the third feature.

The feature extraction module 400 and the feature processing module 500 process the first feature to obtain a second feature when the second feature is specifically implemented: the first feature is taken as an input of the feature extraction module 400, the output of the feature extraction module 400 is taken as an input of the feature processing module 500, and the feature processing module 500 outputs a second feature.

In one example, the n= 7,i =2.

In the model use stage:

the N encoders are used for processing the image to be processed to obtain image characteristics of the image to be processed;

the decoder 330 is configured to obtain, according to the image feature, a character included in the image to be processed.

Exemplary apparatus

Based on the method provided by the embodiment, the embodiment of the application also provides a device, and the device is described below with reference to the accompanying drawings.

Referring to fig. 4, a schematic structural diagram of an image processing apparatus according to an embodiment of the present application is shown. The apparatus 600 may specifically include, for example: an acquisition unit 601 and a processing unit 602.

An acquisition unit 601 for acquiring an image to be processed including characters;

a processing unit 602, configured to input the image to be processed into the character recognition model, to obtain characters included in the image to be processed; wherein:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

Optionally, the character recognition model includes a decoder and N encoders;

Since the apparatus 600 is an apparatus corresponding to the image processing method provided in the above method embodiment, the specific implementation of each unit of the apparatus 600 is the same as the image processing method described in the above method embodiment, and therefore, with respect to the specific implementation of each unit of the apparatus 600, reference may be made to the relevant description parts of the above method embodiment, which are not repeated herein.

The embodiment of the application also provides equipment, which comprises a processor and a memory;

the processor is configured to execute the instructions stored in the memory, so that the apparatus executes the image processing method provided in the above method embodiment.

Embodiments of the present application provide a computer-readable storage medium including instructions that instruct a device to perform the image processing method provided in the above method embodiments.

The present application also provides a computer program product which, when run on a computer, causes the computer to perform the image processing method provided by the above method embodiments.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims

1. An image processing method, the method comprising:

acquiring an image to be processed comprising characters;

inputting the image to be processed into a character recognition model to obtain characters contained in the image to be processed; wherein:

extracting a first feature of the training image;

determining the feature of the corresponding character in the first feature, wherein the determining the feature of the corresponding character in the first feature comprises the following steps: determining the characteristics of the corresponding characters in the first characteristics by utilizing a characteristic extraction module, wherein the characteristic extraction module is used for determining the characteristics of the corresponding characters in the first characteristics according to the first characteristics, and is independent of the character recognition model and is a time sequence classification algorithm CTC classification module;

obtaining a character prediction result according to the second characteristic;

2. The method of claim 1, wherein blurring the feature of the corresponding character in the first feature comprises any one or more of:

3. The method of claim 1, wherein the character recognition model comprises a decoder and N encoders;

4. A method according to claim 3, wherein the N encoders are used to extract image features of the image to be processed when recognizing characters in the image to be processed using the character recognition model, and the decoder is used to obtain the characters included in the image to be processed from the image features.

5. The method of claim 1, wherein the blurring the feature of the corresponding character in the first feature to obtain a second feature includes:

6. The method according to any one of claims 1-5, wherein obtaining a character prediction result according to the second feature comprises:

7. An image processing apparatus, characterized in that the apparatus comprises:

the processing unit is used for inputting the image to be processed into a character recognition model to obtain characters contained in the image to be processed; wherein:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

8. An electronic device comprising a processor and a memory;

the processor is configured to execute instructions stored in the memory to cause the apparatus to perform the method of any one of claims 1 to 6.

9. A computer readable storage medium comprising instructions that instruct a device to perform the method of any one of claims 1 to 6.