CN114187593A

CN114187593A - Image processing method and device

Info

Publication number: CN114187593A
Application number: CN202111526049.7A
Authority: CN
Inventors: 张家鑫; 黄灿
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-15
Anticipated expiration: 2041-12-14
Also published as: CN114187593B

Abstract

The application discloses an image processing method, which comprises the following steps: acquiring an image to be processed; inputting the image to be processed into a character recognition model obtained by pre-training to obtain characters included in the image to be processed; the character recognition model is used to: and extracting the image characteristics of the image to be processed, and obtaining characters included in the image to be processed according to the image characteristics. When the character recognition model is trained, the first feature of the training image can be extracted, the feature of the corresponding character in the first feature is determined, then the feature of the corresponding character in the first feature is subjected to fuzzy processing to obtain the second feature, the character prediction result is obtained according to the second feature, and the parameter of the character recognition model is updated based on the character prediction result and the label corresponding to the training image. Therefore, the character recognition model has the capability of predicting the real characters corresponding to the features corresponding to the characters subjected to fuzzy processing, and the characters in the image to be processed can be accurately recognized by the method.

Description

Image processing method and device

Technical Field

The present application relates to the field of image processing, and in particular, to an image processing method and apparatus.

Background

In some scenarios, it is desirable to identify characters in an image. However, the current methods for recognizing characters in images cannot accurately recognize the characters in the images.

Therefore, a solution is urgently needed to accurately recognize characters in an image.

Disclosure of Invention

The technical problem to be solved by the application is how to accurately identify characters in an image, and an image processing method and device are provided.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring an image to be processed including characters;

inputting the image to be processed into the character recognition model to obtain characters included in the image to be processed; wherein:

the character recognition model is used for: extracting image features of the image to be processed, and obtaining characters included in the image to be processed according to the image features; wherein:

the character recognition model is obtained by training in the following way:

acquiring a training image and a label corresponding to the training image, wherein the label corresponding to the training image is used for indicating characters included in the training image;

training a character recognition model based on the training image and a label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image; wherein:

the training of the character recognition model based on the training images and the labels corresponding to the training images comprises:

extracting a first feature of the training image;

determining the characteristics of the corresponding characters in the first characteristics;

fuzzy processing is carried out on the characteristics of the corresponding characters in the first characteristics to obtain second characteristics;

obtaining a character prediction result according to the second characteristic;

and updating the parameters of the character recognition model based on the character prediction result and the label corresponding to the training image.

Optionally, the blurring processing on the feature of the corresponding character in the first feature includes any one or more of the following:

removing part of the characteristics of the corresponding characters in the first characteristics; or,

and modifying part of the characteristics of the corresponding characters in the first characteristics.

Optionally, the character recognition model includes a decoder and N encoders;

the first i encoders in the N encoders are connected in series, the first i encoders are used for obtaining the first characteristic according to the training image, the first characteristic is the output of the ith encoder, and i is a positive integer smaller than N;

the last (N-i) encoders are connected in series, and the last (N-i) encoders are used for processing the second characteristic to obtain a third characteristic;

and the decoder is used for obtaining the character prediction result according to the third characteristic.

Optionally, when the character recognition model is used to recognize characters in the image to be processed, the N encoders are configured to extract image features of the image to be processed, and the decoder is configured to obtain characters included in the image to be processed according to the image features. 5. The method of claim 1, wherein determining the feature of the first feature for the corresponding character comprises:

and determining the characteristics of the corresponding characters in the first characteristics by using a characteristic extraction module, wherein the characteristic extraction module is used for determining the characteristics of the corresponding characters in the first characteristics according to the first characteristics, and the characteristic extraction module is independent of the character recognition model.

Optionally, the feature extraction module is a CTC classification module using a time-series classification algorithm.

Optionally, the performing fuzzy processing on the feature of the corresponding character in the first feature to obtain a second feature includes:

and carrying out fuzzy processing on the characteristics of the corresponding characters in the first characteristics by using a characteristic processing module independent of the character recognition model to obtain second characteristics.

Optionally, obtaining a character prediction result according to the second feature includes:

and obtaining a character prediction result according to the second characteristic and the characteristic of the corresponding background noise in the first characteristic.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

an acquisition unit configured to acquire an image to be processed including characters;

the processing unit is used for inputting the image to be processed into the character recognition model to obtain characters included in the image to be processed; wherein:

the character recognition model is obtained by training in the following way:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

Optionally, the character recognition model includes a decoder and N encoders;

Optionally, when the character recognition model is used to recognize characters in the image to be processed, the N encoders are configured to extract image features of the image to be processed, and the decoder is configured to obtain characters included in the image to be processed according to the image features. 5. The apparatus of claim 1, wherein the determining the feature of the first feature for the corresponding character comprises:

In a third aspect, an embodiment of the present application provides an apparatus, which includes a processor and a memory;

the processor is configured to execute instructions stored in the memory to cause the apparatus to perform the method of any of the first aspects above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium comprising instructions that instruct a device to perform the method according to any one of the above first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method of any of the above first aspects.

Compared with the prior art, the embodiment of the application has the following advantages:

the embodiment of the application provides an image processing method, which comprises the following steps: acquiring an image to be processed; inputting the image to be processed into a character recognition model obtained by pre-training to obtain characters included in the image to be processed; wherein: the character recognition model is used for: extracting image features of the image to be processed, obtaining characters included in the image to be processed according to the image features, and acquiring a training image and a label corresponding to the training image when training the character recognition model, wherein the label corresponding to the training image is used for indicating the characters included in the training image; and training a character recognition model based on the training image and the label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image. When the character recognition model is trained, the first feature of the training image can be extracted, the feature of the corresponding character in the first feature is determined, then the feature of the corresponding character in the first feature is subjected to fuzzy processing to obtain a second feature, a character prediction result is obtained according to the second feature, and the parameter of the character recognition model is updated based on the character prediction result and the label corresponding to the training image. In the training of the character recognition model, the character prediction result is obtained from the second feature obtained by blurring the feature of the corresponding character in the first feature, so that the character recognition model has the capability of predicting the character corresponding to the feature corresponding to the character subjected to blurring, and therefore, the character can be accurately recognized by the character recognition model even if the character itself in the image to be processed is unclear. Namely: by the image processing method provided by the embodiment of the application, the characters in the image to be processed can be accurately identified.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flowchart of a model training method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a character recognition model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The inventor of the present application has found through research that, at present, a machine learning model can be trained in advance to use the machine learning model, and the trained machine learning model is used to recognize an image, so as to determine characters included in the image.

In one example, the machine learning model may be a Transformer model, where the Transformer model includes an encoder (encoder) and a decoder (decoder), the encoder is configured to encode an image to obtain image features, and the decoder is configured to decode the features output by the encoder to obtain characters included in the image.

Even if a character in an image is recognized using a transform model, there may be a problem of inaccurate recognition, for example, inaccurate recognition results due to the unclear character itself in the image.

In order to solve the above problem, an embodiment of the present application provides a model training method and apparatus.

Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.

Exemplary method

The embodiment of the application provides an image processing method, in which a character recognition model obtained by pre-training can be used for recognizing characters in an image to be processed, and when the character recognition model is used for recognizing the characters in the image to be processed, the characters can be accurately recognized even if the characters in the image to be processed are not clear.

Next, the training process of the character recognition model will be described first.

Referring to fig. 1, the figure is a schematic flow chart of a model training method provided in the embodiment of the present application. In this embodiment, the method may be executed by a terminal or a server, and the embodiment of the present application is not particularly limited.

The method shown in fig. 1, for example, may comprise the steps of: S101-S102.

It should be noted that the process of model training is a process of multiple iterative computations, each iteration can adjust the parameters of the model, and the adjusted parameters participate in the next iterative computation.

Fig. 1 illustrates a certain iteration process in training a character recognition model, taking a certain training image as an example. It will be appreciated that there are many sets of training images used to train the character recognition model, and that each set of training images is processed similarly when the formula recognition model is trained. After training of a plurality of groups of training images, the character recognition model with the accuracy meeting the requirement can be obtained.

S101: the method comprises the steps of obtaining a training image and a label corresponding to the training image, wherein the label corresponding to the training image is used for indicating characters included in the training image.

In one example, the training image may be an image that includes characters. The training image may be obtained by shooting with a shooting device, may also be obtained from a network resource, and may also be obtained in other manners, which is not specifically limited in the embodiments of the present application.

In one example, the raw image may be acquired and then processed to obtain a training image. The original image may be processed, for example, by changing the size of the original image, and in one example, the width and height of the original image may be scaled proportionally so that the height of the processed image is a preset height (for example, 32).

In one example, the labels corresponding to the training images may be manually labeled.

S102: and training a character recognition model based on the training image and the label corresponding to the training image, wherein the character recognition model is used for recognizing characters in the image.

S102, in a particular implementation, may include the following S1021-S1025.

S1021: extracting a first feature of the training image.

In one example, the character recognition model may include a feature extraction module to extract a first feature of the training image.

In one example, the feature extraction module may include i encoders, i is an integer greater than or equal to 1, and the first feature is an output of the ith encoder. And when i is larger than 1, the i encoders are connected in series, the first encoder in the i encoders is used for processing the training image, and the output of the jth encoder is the input of the (j +1) th encoder. Wherein (j +1) is less than or equal to i. The encoder mentioned here may be a native encoder of a conventional transform model, and will not be described in detail here.

S1022: and determining the characteristics of the corresponding characters in the first characteristics.

S1023: and carrying out fuzzy processing on the characteristics of the corresponding characters in the first characteristics to obtain second characteristics.

S1024: and obtaining a character prediction result according to the second characteristic.

In the embodiment of the present application, in order to train the obtained character recognition model, even when the character itself included in the image to be processed is unclear, the character can be recognized. The first feature may be identified, the feature of the corresponding character in the first feature is subjected to fuzzy processing to obtain a second feature, and a character prediction result is obtained according to the second feature. Since the character prediction result is obtained according to the second feature, the character recognition model actually has the capability of predicting the corresponding character before the feature is blurred according to the feature which is blurred. Thus, the trained character recognition model can accurately recognize the character even if the character itself in the image to be processed is unclear.

With respect to S1022, it should be noted that, in an example, the feature extraction module may be used to determine the feature of the corresponding character in the first feature. The feature extraction module is configured to classify the first feature to determine a feature of a corresponding character in the first feature. Wherein the feature extraction module may be a module independent of the character recognition model. In training the character recognition model, the first features are processed by means of the feature extraction module. After the training of the character recognition model is completed, the feature extraction module does not need to participate in calculation when the character recognition model is used for recognizing characters in the image to be processed.

In one example, in view of a Connection Timing Classification (CTC) algorithm, it is possible to identify whether a feature is a feature of a corresponding character or a feature of a corresponding background noise. Thus, the feature extraction module may be a CTC module. In other words, in particular implementations, S1022 may utilize the CTC module to classify the first feature to determine the feature of the corresponding character in the first feature.

Regarding S1023, it should be noted that, in the specific implementation of S1023, a blurring process may be performed on a part of the first features, so as to obtain second features. For example, 15% of the first features are blurred.

There are many implementations of blurring some of the first features. In one example, some of the features of the corresponding character in the first features may be removed. In yet another example, some of the features of the corresponding character in the first features may be modified, e.g., the features originally corresponding to the first character are modified to correspond to features of other characters different from the first character.

For example, the following steps are carried out: for the 15% feature, a part of the 15% feature (for example, 80%) may be removed, and another part of the 15% feature (for example, 10%) may be modified into a feature corresponding to another character, and the other part of the 15% feature (for example, 10%) may remain unchanged.

In one example, S1023 may be implemented by a feature processing module independent of the character recognition model. In other words, in a specific implementation, the S1023 may use a feature processing module independent of the character recognition model to perform fuzzy processing on the feature of the corresponding character in the first feature to obtain the second feature.

Regarding S1024, it should be noted that, in an example, the character recognition model may include a character recognition module, and the character recognition module is configured to obtain a character prediction result according to the second feature. In one example, the character recognition module may be a decoder, which may be a native decoder of a conventional transform model and will not be described in detail herein.

The "obtaining the character included in the image to be processed according to the second feature" mentioned here may be understood as processing according to a third feature obtained by further processing the second feature, so as to obtain a character prediction result.

In one example, the character recognition model may include an additional (N-i) encoders in addition to the aforementioned i encoders that extract the first features. In other words, the character recognition model may include N encoders. And when (N-i) is larger than 1, the last (N-i) encoders are connected in series, and the last (N-i) encoders are used for processing the second feature to obtain a third feature. It will be appreciated that for the (i +1) th encoder of the N encoders, its input is no longer the first characteristic output by the ith encoder, but is the second characteristic output by the characteristic processing module. In other words, in the training phase of the character recognition model, between the ith encoder and the (i +1) th encoder, a feature extraction module and a feature processing module independent of the character recognition model are further included.

It should be noted here that, although the input of the (i +1) th encoder is the second feature in the training stage of the character recognition model, the input of the (i +1) th encoder is the output of the i-th encoder in the application stage of the character recognition model.

S1025: and updating the parameters of the character recognition model based on the character prediction result and the label corresponding to the training image.

Since the label corresponding to the training image is used to indicate the character included in the training image, and the character prediction result is the character in the training image recognized by the character recognition model, the parameter of the character recognition model may be updated based on the character prediction result and the label corresponding to the training image. In the following training process, the character prediction result of the character recognition model after parameter adjustment can be closer to the label corresponding to the training image.

In one example, when the character prediction result is obtained, in addition to the second feature, a feature corresponding to background noise in the first feature may be considered. For this case, S1024, when implemented specifically, may be: and obtaining a character prediction result according to the second characteristic and the characteristic of the corresponding background noise in the first characteristic. The "obtaining the character prediction result according to the second feature and the feature of the corresponding background noise in the first feature" may be understood as: "obtaining a character prediction result according to the second feature and a part or all of the features of the first feature corresponding to the background noise".

For this case, the last (N-i) encoders of the character recognition model may process the second feature and the feature corresponding to the background noise in the first feature to obtain a fourth feature. Accordingly, the decoder may obtain a character prediction result for the fourth feature.

In one example, the feature of the first feature corresponding to the background noise may also be determined by the aforementioned feature extraction module.

Next, an image processing method provided in an embodiment of the present application will be described. Referring to fig. 2, the figure is a schematic flowchart of an image processing method according to an embodiment of the present application. The image processing method shown in fig. 2 may include the following S201-S202.

S201: an image to be processed including characters is acquired.

The image to be processed may be obtained by shooting with a shooting device, may also be obtained from a network resource, and may also be obtained in other manners, which is not specifically limited in the embodiment of the present application.

S202: inputting the image to be processed into the character recognition model to obtain characters included in the image to be processed; wherein: the character recognition model is used for: and extracting the image characteristics of the image to be processed, and obtaining characters included in the image to be processed according to the image characteristics.

After the image to be processed is obtained, the image to be processed may be input to a trained character recognition model, and the character recognition model may output characters included in the image to be processed.

The character recognition model mentioned here refers to a model obtained by training using the method shown in fig. 1.

As can be seen from the above description of fig. 1, the character recognition model includes N encoders and a decoder, when the character recognition model is used to recognize characters in an image to be processed, the N encoders are used to extract image features of the image to be processed, and the decoder is used to obtain the characters included in the image to be processed according to the image features.

The character recognition model will now be described with reference to fig. 3. Fig. 3 is a schematic structural diagram of a character recognition model according to an embodiment of the present application.

As shown in fig. 3, the character recognition model 300 includes N encoders and decoders 330, the N encoders including: i encoders 310 and (N-i) encoders 320. Wherein:

the encoder 310 and the encoder 330 have the same structure.

In the model training phase:

the output of the ith encoder is the first feature, and then the feature extraction module 400 and the feature processing module 500 independent of the character recognition model process the first feature to obtain the second feature, which is used as the input of the (i +1) th encoder, and the (N-i) encoders 320 obtain the third feature according to the second feature. The decoder 330 obtains a character prediction result according to the third characteristic.

The feature extraction module 400 and the feature processing module 500 process the first feature to obtain a second feature, when the second feature is specifically implemented: the first feature is used as an input of the feature extraction module 400, an output of the feature extraction module 400 is used as an input of the feature processing module 500, and the feature processing module 500 outputs a second feature.

In one example, N-7, i-2.

In the model use stage:

the N encoders are used for processing the image to be processed to obtain the image characteristics of the image to be processed;

the decoder 330 is configured to obtain characters included in the image to be processed according to the image features.

Exemplary device

Based on the method provided by the above embodiment, the embodiment of the present application further provides an apparatus, which is described below with reference to the accompanying drawings.

Referring to fig. 4, the figure is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The apparatus 600 may specifically include, for example: an acquisition unit 601 and a processing unit 602.

An acquisition unit 601 configured to acquire an image to be processed including characters;

a processing unit 602, configured to input the image to be processed into the character recognition model, so as to obtain characters included in the image to be processed; wherein:

the character recognition model is obtained by training in the following way:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

Optionally, the character recognition model includes a decoder and N encoders;

Since the apparatus 600 is a device corresponding to the image processing method provided in the above method embodiment, and the specific implementation of each unit of the apparatus 600 is the same as the image processing method described in the above method embodiment, reference may be made to the relevant description part of the above method embodiment for the specific implementation of each unit of the apparatus 600, and details are not repeated here.

An embodiment of the present application further provides an apparatus, which includes a processor and a memory;

the processor is used for executing the instructions stored in the memory so as to cause the equipment to execute the image processing method provided by the above method embodiment.

The embodiment of the application provides a computer-readable storage medium which comprises instructions for instructing equipment to execute the image processing method provided by the method embodiment.

The embodiment of the present application further provides a computer program product, which when running on a computer, causes the computer to execute the image processing method provided by the above method embodiment.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an image to be processed including characters;

the character recognition model is obtained by training in the following way:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

2. The method according to claim 1, wherein the blurring the feature of the corresponding character in the first feature includes any one or more of:

3. The method of claim 1, wherein the character recognition model comprises a decoder and N encoders;

4. The method according to claim 3, wherein the N encoders are configured to extract image features of the image to be processed when the character recognition model is used to recognize characters in the image to be processed, and the decoder is configured to obtain characters included in the image to be processed according to the image features.

5. The method of claim 1, wherein determining the feature of the first feature for the corresponding character comprises:

6. The method of claim 5, wherein the feature extraction module is a time series classification algorithm (CTC) classification module.

7. The method according to claim 1, wherein the blurring the feature of the corresponding character in the first feature to obtain a second feature comprises:

8. The method according to any one of claims 1-7, wherein obtaining a character prediction result according to the second feature comprises:

9. An image processing apparatus, characterized in that the apparatus comprises:

the character recognition model is obtained by training in the following way:

extracting a first feature of the training image;

obtaining a character prediction result according to the second characteristic;

10. An apparatus, comprising a processor and a memory;

the processor is to execute instructions stored in the memory to cause the device to perform the method of any of claims 1 to 8.

11. A computer-readable storage medium comprising instructions that direct a device to perform the method of any of claims 1-8.