CN111932500A

CN111932500A - Image processing method and device

Info

Publication number: CN111932500A
Application number: CN202010664624.9A
Authority: CN
Inventors: 张秋晖; 喻庐军; 刘岩
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-11-13
Anticipated expiration: 2040-07-10
Also published as: CN111932500B

Abstract

The application provides an image processing method and device. The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy of OCR recognition on the sample image and the brightness of the sample image. According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality. Therefore, in the process that the electronic equipment acquires the quality of the image according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.

Description

Image processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus.

Background

Currently, techniques for recognizing text in an image, such as OCR (Optical Character Recognition), are widely used in various fields. For example, identification of documents such as identification cards, identification of financial instruments and insurance documents, and the like.

For example, in an application, in an insurance intelligent claims settlement service, a user may take a picture of an insurance document or a medical bill, and upload the picture to a server of an insurance company, and the server performs OCR recognition on the picture to obtain content in the picture, and then performs calculation and payment of a claim amount according to the content in the picture.

However, in the process of taking a picture, in the process of transmitting the picture to the server side or in the process of compressing the picture before transmitting the picture to the server side, distortion of the picture may sometimes be caused.

Disclosure of Invention

The application discloses an image processing method and device.

In a first aspect, the present application shows a method of image processing, the method comprising:

receiving an image uploaded by a user;

acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;

when the quality is greater than or equal to a preset threshold value, performing OCR recognition on the image;

and when the quality is smaller than a preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.

In an optional implementation manner, the training manner of the image quality evaluation model includes:

acquiring a sample image and acquiring the labeling quality of the sample image;

constructing a network structure of an initialization model;

and training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain the image quality evaluation model.

In an optional implementation manner, the obtaining of the annotation quality of the sample image includes:

obtaining the accuracy rate of OCR recognition on the sample image;

acquiring the brightness of the sample image;

and acquiring the labeling quality of the sample image based on the brightness and the accuracy.

In an optional implementation manner, the obtaining an accuracy of OCR recognition on the sample image includes:

acquiring an annotation text in the sample image;

performing OCR recognition on the sample image to obtain a recognition text in the sample image;

and acquiring the accuracy rate based on the labeling text and the identification text.

In an optional implementation manner, the obtaining the accuracy based on the annotation text and the recognition text includes:

determining an accurate text recognized based on OCR in the recognized text based on the marked text;

and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.

In an optional implementation manner, the accuracy includes an accuracy of performing OCR recognition on a plurality of randomly selected fields in the sample image, and the brightness includes brightness of each field in the plurality of fields in the sample image;

the obtaining of the labeling quality of the sample image based on the brightness and the accuracy includes:

for any field in the fields, acquiring the labeling quality of the field according to the accuracy of OCR recognition on the field and the brightness of the field;

and calculating the average value of the labeling quality of each field to obtain the labeling quality of the sample image.

In an alternative implementation, the network structure includes at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;

the convolutional pooling layers comprise at least one convolutional layer and at least one pooling layer, the pooling layers comprise a max pooling layer; the convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix;

the up-sampling layer is used for increasing the dimension of the characteristic matrix;

the superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix;

the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value;

the logistic regression layer is used for converting the characteristic numerical value into the quality of an image.

In an alternative implementation, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer;

the up-sampling layer comprises a first up-sampling layer, a second up-sampling layer and a third up-sampling layer;

the superposition layers comprise a first superposition layer, a second superposition layer and a third superposition layer;

the input end of the first convolution pooling layer is used for inputting an image;

the output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer;

the output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer;

the output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer;

the output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer;

the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer;

the output end of the first upper sampling layer is connected with the input end of the first superposition layer;

the output end of the first superposition layer is connected with the input end of the second upper sampling layer;

the output end of the second upper sampling layer is connected with the input end of the second superposition layer;

the output end of the second superposition layer is connected with the input end of the third upper sampling layer;

the output end of the third up-sampling layer is connected with the input end of the third superposition layer;

the output end of the third superposed layer is connected with the input end of the full connection layer;

and the output end of the full connection layer is connected with the input end of the logistic regression layer.

In a second aspect, the present application shows an image processing apparatus, the apparatus further comprising:

the receiving module is used for receiving the image uploaded by the user;

the first acquisition module is used for acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;

the recognition module is used for performing OCR recognition on the image when the quality is greater than or equal to a preset threshold value;

and the output module is used for outputting prompt information when the quality is smaller than a preset threshold value, wherein the prompt information is used for prompting the user to input the image again.

In an optional implementation, the apparatus further comprises:

the first acquisition module is used for acquiring a sample image; the second acquisition module is used for acquiring the labeling quality of the sample image;

the building module is used for building a network structure of the initialization model;

and the training module is used for training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain the image quality evaluation model.

In an optional implementation manner, the second obtaining module includes:

the first acquisition unit is used for acquiring the accuracy of OCR recognition on the sample image;

a second acquisition unit configured to acquire brightness of the sample image;

and the third acquisition unit is used for acquiring the labeling quality of the sample image based on the brightness and the accuracy.

In an optional implementation manner, the first obtaining unit includes:

the first obtaining subunit is used for obtaining the annotation text in the sample image;

the identification subunit is used for performing OCR (optical character recognition) on the sample image to obtain an identification text in the sample image;

and the second acquiring subunit is used for acquiring the accuracy rate based on the labeling text and the identification text.

In an optional implementation manner, the second obtaining subunit is specifically configured to: determining an accurate text recognized based on OCR in the recognized text based on the marked text; and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.

the third acquisition unit includes:

a third obtaining subunit, configured to, for any one of the fields, obtain, according to the accuracy of OCR recognition on the field and the brightness of the field, a labeling quality of the field;

and the computing subunit is used for computing the average value of the labeling quality of each field to obtain the labeling quality of the sample image.

In a third aspect, the present application shows an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method according to the first aspect when executing the program.

In a fourth aspect, the present application shows a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to the first aspect.

Compared with the prior art, the method has the following advantages:

the inventors have found that the accuracy of OCR on an image is proportional to the quality of the image. Thus, if the image is distorted, it may cause inaccuracy in the contents of the image recognized based on the OCR.

In order to avoid this, in one possible implementation, after the server receives the image uploaded by the user, the staff may manually check the quality of the image, and if the staff subjectively thinks that the quality of the image meets the requirement, the staff may indicate that the quality of the image meets the requirement to the server, so that the server may perform OCR recognition on the image later.

However, the inventors found that: in the implementation mode, the quality of the image needs to be checked manually by workers, so that the labor cost is high.

Secondly, in many cases, the number of images to be subjected to OCR recognition is large, which results in a large amount of review work for the quality of the images, and therefore, many workers are often arranged to perform manual review.

However, in the above method, there is no objective quality assessment standard, and the assessment of the image quality is greatly influenced by the subjective consciousness of the staff.

For example, some auditors consider that the accuracy of OCR recognition performed on an image with a quality meeting the requirement is low, while other auditors consider that the accuracy of OCR recognition performed on an image with a quality not meeting the requirement is high, which makes the result of manual audit be wrong easily.

In the application, an image uploaded by a user is received; acquiring the quality of the image based on the image quality evaluation model; judging whether the quality of the image is greater than or equal to a preset threshold value; when the quality of the image is greater than or equal to a preset threshold value, performing OCR recognition on the image; and when the quality of the image is less than the preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.

The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image.

According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality.

Therefore, in the application, in the process of acquiring the quality of the image by the electronic equipment according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.

Drawings

Fig. 1 is a flowchart of the steps of an image processing method of the present application.

FIG. 2 is a flow chart of the steps of a method of training an image quality assessment model of the present application.

FIG. 3 is a flowchart illustrating steps of a method for obtaining annotation quality of a sample image according to the present application.

Fig. 4 is a schematic diagram of a network structure of an image quality evaluation model of the present application.

Fig. 5 is a block diagram of an image processing apparatus according to the present application.

Fig. 6 is a block diagram of an electronic device shown in the present application.

Fig. 7 is a block diagram of an electronic device shown in the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, a flowchart illustrating steps of an image processing method according to the present application is shown, where the method is applied to an electronic device, where the electronic device includes a server or a terminal, and the method may specifically include the following steps:

in step S101, an image uploaded by a user is received;

in the application, the image uploaded by the user can be an image obtained by photographing a certificate, a document, a bill or the like.

For example, in an insurance business scenario, in an insurance settlement phase, a user may upload an insurance document, a medical bill, and the like to a server of an insurance company, and in one aspect, the user may take a picture of the insurance document or the medical bill to obtain an image including the insurance document and the medical bill, and upload the image including the insurance document and the medical bill to the server of the insurance company, so that an insurance formula performs OCR recognition on the image to obtain the content of the insurance document, the medical bill, and the like in the image.

However, some images uploaded have low quality, so that the electronic device cannot accurately recognize the content in the images, and therefore, step S102 may be performed before performing OCR recognition on the images.

In step S102, the quality of the image is acquired based on the image quality evaluation model; the image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image;

in the present application, the image may be input to a trained image quality evaluation model to obtain the quality of the image output by the image quality evaluation model.

The training mode of the image quality evaluation model may refer to an embodiment shown in fig. 2, which will not be described in detail here.

In step S103, it is determined whether the quality of the image is greater than or equal to a preset threshold;

when the quality of the image is greater than or equal to the preset threshold, performing OCR recognition on the image in step S104;

when the quality of the image is less than the preset threshold, in step S105, a prompt message for prompting the user to re-input the image is output.

Until the quality of the image uploaded by the user is greater than or equal to a preset threshold.

Asynchronously, the user uploaded image may also be discarded when the quality of the image is less than a preset threshold.

In an alternative implementation, referring to fig. 2, the training mode of the image quality evaluation model includes:

in step S201, a sample image is obtained and the labeling quality of the sample image is obtained;

in the present application, the sample image may be manually collected by a technician or automatically collected by an electronic device on a network.

In one embodiment, in order to improve the performance of the trained image quality evaluation model and to simplify the image quality evaluation model on the premise of ensuring the accuracy of the quality of the image predicted by the image quality evaluation model, in the case of a plurality of sample images, the sample images may be unified into a sample image of a specific size. For example, the sample images are unified into 512 × 512 size, that is, each unified sample image includes 512 × 512 pixels.

In this way, after the image quality evaluation model is online, in the embodiment shown in fig. 1, after the image uploaded by the user is received in step S101 and before step S102, it may be determined whether the size of the image uploaded by the user is a specific size, and if the size of the image uploaded by the user is not the specific size, the image uploaded by the user may be converted into an image of the specific size, and step S102 may be performed on the converted image of the specific size.

The labeling quality of the sample image may be directly and manually labeled by a technician according to the standard definition of the image quality, or may be obtained in another way, which is specifically referred to as the embodiment shown in fig. 3 and will not be described in detail herein.

In step S202, a network structure of the initialization model is constructed;

the network structure comprises at least: specific structural examples of the convolution pooling layer, the sampling layer, the stacking layer, the full connection layer, the logistic regression layer, and the like can be referred to the embodiment shown in fig. 4 later, and are not described in detail here.

In step S203, the initialization model is trained based on the sample image and the annotation quality until the network parameters in the initialization model converge, so as to obtain an image quality evaluation model.

In another embodiment of the present application, referring to fig. 3, step S201 includes: the method comprises the following steps:

in step S301, obtaining an accuracy rate of performing OCR recognition on the sample image;

in an embodiment of the present application, the step may be implemented by the following process, including:

3011. acquiring an annotation text in a sample image;

in the application, a technician can manually identify the text in the sample text to obtain the text in the sample text, and the text is uploaded to the electronic device as the labeled text in the sample image, and the electronic device receives the labeled text uploaded by the technician in the sample image.

3012. Performing OCR recognition on the sample image to obtain a recognition text in the sample image;

3013. and acquiring the accuracy of OCR recognition on the sample image based on the labeling text and the recognition text.

In the application, all recognition texts obtained after OCR recognition is carried out on a sample image by the electronic equipment may be correct, and a recognition error condition may exist, so that an accurate text recognized based on OCR can be determined in the recognition texts based on the labeled text; for example, the annotated text is compared to the recognized text, such that the exact text identified in the recognized text that is determined to be based on OCR can be determined. Then, the ratio of the number of the recognized accurate texts to the number of the labeled texts can be calculated, so that the accuracy of OCR recognition on the sample image is obtained.

In step S302, the brightness of the sample image is acquired;

in step S303, the labeling quality of the sample image is obtained based on the brightness of the sample image and the accuracy of OCR recognition on the sample image.

In one embodiment of the present application, a first product between an accuracy rate of OCR recognition on a sample image and a first preset weight coefficient may be calculated, then a second product between a brightness of the sample image and a second preset weight coefficient may be calculated, and then a sum of the first product and the second product may be calculated, so as to obtain an annotation quality of the sample image.

In one example, the sum of the first preset weight coefficient and the second preset weight coefficient may be a specific value, for example, a value of 1, the first preset weight coefficient may be 0.6, etc., and the second preset weight coefficient may be 0.4, etc.

The possible brightness of the image is located in a section, for example, (0, X), etc., and thus the brightness of the acquired sample image is located in the section (0, X). X is greater than 0.

Since the accuracy of OCR recognition of the sample image obtained in step S301 is located between the intervals (0, 1), and when acquiring the annotation quality of the sample image, in the above-described manner, the accuracy of OCR recognition of the sample image and the brightness of the sample image need to be combined.

Therefore, if the brightness of the sample image is large, the calculated result of the labeling quality of the sample image may highlight the effect of the brightness of the sample image, and the effect of the accuracy rate of performing OCR recognition on the sample image is reduced, which may result in low accuracy of the labeling quality of the acquired sample image.

Therefore, in order to improve the accuracy of the labeling quality of the acquired sample image, in another embodiment of the present application, the brightness of the acquired sample image needs to be normalized to be within the interval (0, 1).

Specifically, the luminance of the sample image may be divided by the right end point X of the interval (0, X) to be normalized to within the interval (0, 1). And then acquiring the labeling quality of the sample image based on the accuracy of OCR recognition on the sample image and the brightness of the normalized sample image.

In another embodiment of the present application, in consideration that the quality of different regions in the image may be unbalanced, a plurality of fields may be selected immediately in the sample image, where the fields include, but are not limited to, a region in which a piece of text is located, a region in which a line of text is located, a region in which a piece of text is located, or a region in another form.

As such, the accuracy of OCR recognition of the sample image may include the accuracy of OCR recognition of randomly selected fields in the sample image, and the brightness includes the brightness of each of the fields in the sample image.

In this way, when the labeling quality of the sample image is obtained based on the brightness and the accuracy, for any one of the fields, the labeling quality of the field can be obtained according to the accuracy of OCR recognition on the field and the brightness of the field; in one embodiment of the present application, a first product between an accuracy rate of OCR recognition of the field and a first preset weight coefficient may be calculated, then a second product between a brightness of the field and a second preset weight coefficient may be calculated, and then a sum of the first product and the second product may be calculated, so as to obtain an annotation quality of the field. The same is true for each of the others of the plurality of fields.

Then, an average value of the labeling quality of each field can be calculated to obtain the labeling quality of the sample image.

In one embodiment of the present application, initializing the network structure of the model includes at least: the device comprises a plurality of convolution pooling layers, a plurality of upsampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer.

The convolution pooling layers include at least one convolution layer and at least one pooling layer, the pooling layers including a maximum pooling layer. The convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix.

The upsampling layer is used for upscaling the feature matrix.

The superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix.

And the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value.

The logistic regression layer is used to convert the feature values into the quality of the image.

Referring to fig. 4, in one example, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer.

The upsampling layer includes a first upsampling layer, a second upsampling layer, and a third upsampling layer.

The superposition layer comprises a first superposition layer, a second superposition layer and a third superposition layer.

The input of the first convolution pooling layer is used for inputting images.

The output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer.

The output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer.

The output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer.

The output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer.

And the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer.

The output end of the first up-sampling layer is connected with the input end of the first superposition layer.

The output end of the first superposition layer is connected with the input end of the second upper sampling layer.

And the output end of the second upper sampling layer is connected with the input end of the second superposition layer.

And the output end of the second superposition layer is connected with the input end of the third upper sampling layer.

And the output end of the third up-sampling layer is connected with the input end of the third superposition layer.

And the output end of the third superposed layer is connected with the input end of the full connecting layer.

The first convolution pooling layer comprises 1 convolution layer and 1 pooling layer, the input end of the convolution layer is the input end of the first convolution pooling layer, the output end of the convolution layer is connected with the input end of the pooling layer, and the output end of the pooling layer is the output end of the first convolution pooling layer. The convolutional layer may include 64 3 x 3 convolutional kernels, and the pooling layer includes a maximum pooling layer (maxporoling).

The second convolution pooling layer comprises 2 convolution layers and 1 pooling layer, wherein the 2 convolution layers are convolution layer 1 and convolution layer 2 respectively, the input end of convolution layer 1 is the input end of the second convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of the pooling layer, and the output end of the pooling layer is the output end of the second convolution pooling layer. Convolutional layers 1 and 2 may each include 128 3 x 3 convolutional kernels, with the pooled layers including the maximum pooled layer (maxporoling).

The third convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the third convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the third convolution pooling layer. Convolutional layers 1 and 2 may each include 256 3 × 3 convolutional kernels, convolutional layer 3 may include 256 1 × 1 convolutional kernels, and the pooling layer includes a maximum pooling layer (maxporoling).

The fourth convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the fourth convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the fourth convolution pooling layer. Convolutional layers 1 and 2 may each include 512 3 × 3 convolution kernels, convolutional layer 3 may include 512 1 × 1 convolution kernels, and the pooling layers include maximum pooling layers (maxporoling).

The fifth convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the fifth convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the fifth convolution pooling layer. Convolutional layers 1 and 2 may each include 512 3 × 3 convolution kernels, convolutional layer 3 may include 512 1 × 1 convolution kernels, and the pooling layers include maximum pooling layers (maxporoling).

In one example, the image is input to a first convolution pooling layer and processed by the first convolution pooling layer, the first convolution pooling layer outputs a feature matrix of 512 x 64, the second convolution pooling layer outputs a feature matrix of 256 x 128, the third convolution pooling layer outputs a feature matrix of 128 x 256, the fourth convolution pooling layer outputs a feature matrix of 64 x 512, the fifth convolution pooling layer outputs a feature matrix of 32 x 512,

the 32 × 512 feature matrix output by the fifth convolution pooling layer is processed by the first upsampling layer to obtain a 64 × 512 feature matrix, specifically, the first upsampling layer may insert a numerical value into the 32 × 512 feature matrix by means of mean interpolation to obtain a 64 × 512 feature matrix, which is not described in detail, and the second upsampling layer and the third upsampling layer process feature data in the same way, which is not described in detail later. The first upsampling layer then inputs the resulting feature matrix of 64 x 512 into the first superposition layer.

The first superposition layer superposes the feature matrix of 64 × 512 output by the first upsampling layer with the feature matrix of 64 × 512 output by the fourth convolution pooling layer, for example, by adding the values at the same positions, etc., to obtain a feature matrix of 64 × 512, which is input to the second upsampling layer.

The second upsampling layer processes the feature matrix of 64 x 512 input by the first overlay layer to obtain a feature matrix of 128 x 256, and then inputs the feature matrix of 128 x 256 obtained by the second upsampling layer to the second overlay layer.

The second superposition layer superposes the 128 × 256 feature matrix output by the second upsampling layer with the 128 × 256 feature matrix output by the third convolution pooling layer, for example, by adding the values at the same positions, etc., to obtain a 128 × 256 feature matrix, which is input to the third upsampling layer.

The third upsampling layer processes the 128 x 256 feature matrix input by the second overlay layer to obtain 256 x 128 feature matrix, and then inputs the 256 x 128 feature matrix obtained by the third upsampling layer to the third overlay layer.

The third overlay layer overlays the 256 × 128 feature matrix output by the third upsampling layer with the 256 × 128 feature matrix output by the second convolution pooling layer, e.g., by adding the values at the same locations, etc., to obtain a 256 × 128 feature matrix, which is input to the fully connected layer.

The fully connected layers convert the 256 × 128 feature matrix into a feature value, for example, all feature values in the 256 × 128 feature matrix are added to obtain a feature value, and then the obtained feature value is input into the logistic regression layer.

The logistic regression layer comprises a sigmoid function, and can convert the characteristic numerical value output by the full connection layer into a numerical value between intervals (0, 1) based on the sigmoid function and serve as the quality of the image.

It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary and that no action is necessarily required in this application.

Referring to fig. 5, a block diagram of an image processing apparatus according to the present application is shown, and the apparatus may specifically include the following modules:

the receiving module 11 is used for receiving an image uploaded by a user;

a first obtaining module 12, configured to obtain quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;

the recognition module 13 is configured to perform OCR recognition on the image when the quality is greater than or equal to a preset threshold;

and the output module 14 is configured to output prompt information when the quality is smaller than a preset threshold, where the prompt information is used to prompt the user to re-input an image.

In an optional implementation, the apparatus further comprises:

In an optional implementation manner, the second obtaining module includes:

a second acquisition unit configured to acquire brightness of the sample image;

In an optional implementation manner, the first obtaining unit includes:

the third acquisition unit includes:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Fig. 6 is a block diagram of an electronic device 800 shown in the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast operation information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a block diagram of an electronic device 1900 shown in the present application. For example, the electronic device 1900 may be provided as a server.

Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing describes in detail an image processing method and apparatus provided by the present application, and specific examples are applied herein to explain the principles and embodiments of the present application, and the description of the foregoing examples is only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image processing method, characterized in that the method further comprises:

receiving an image uploaded by a user;

2. The method of claim 1, wherein the image quality assessment model is trained by:

constructing a network structure of an initialization model;

training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain an image quality evaluation model;

the acquiring of the labeling quality of the sample image includes:

obtaining the accuracy rate of OCR recognition on the sample image;

acquiring the brightness of the sample image;

3. The method of claim 2, wherein obtaining an accuracy rate for OCR recognition of the sample image comprises:

acquiring an annotation text in the sample image;

4. The method of claim 3, wherein the obtaining the accuracy based on the annotation text and the recognition text comprises:

5. The method of claim 2, wherein the accuracy comprises an accuracy of OCR recognition of a randomly selected plurality of fields in the sample image, respectively, and the brightness comprises a brightness of each of the plurality of fields in the sample image;

6. The method according to claim 2, characterized in that said network structure comprises at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;

7. The method of claim 6,

the convolution pooling layers comprise a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer and a fifth convolution pooling layer;

8. An image processing apparatus, characterized in that the apparatus further comprises:

the receiving module is used for receiving the image uploaded by the user;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image processing method according to any of claims 1 to 7 are implemented by the processor when executing the program.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 7.