CN111932500A - Image processing method and device - Google Patents

Image processing method and device Download PDF

Info

Publication number
CN111932500A
CN111932500A CN202010664624.9A CN202010664624A CN111932500A CN 111932500 A CN111932500 A CN 111932500A CN 202010664624 A CN202010664624 A CN 202010664624A CN 111932500 A CN111932500 A CN 111932500A
Authority
CN
China
Prior art keywords
layer
image
quality
sample image
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010664624.9A
Other languages
Chinese (zh)
Other versions
CN111932500B (en
Inventor
张秋晖
喻庐军
刘岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN202010664624.9A priority Critical patent/CN111932500B/en
Publication of CN111932500A publication Critical patent/CN111932500A/en
Application granted granted Critical
Publication of CN111932500B publication Critical patent/CN111932500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The application provides an image processing method and device. The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy of OCR recognition on the sample image and the brightness of the sample image. According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality. Therefore, in the process that the electronic equipment acquires the quality of the image according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.

Description

Image processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus.
Background
Currently, techniques for recognizing text in an image, such as OCR (Optical Character Recognition), are widely used in various fields. For example, identification of documents such as identification cards, identification of financial instruments and insurance documents, and the like.
For example, in an application, in an insurance intelligent claims settlement service, a user may take a picture of an insurance document or a medical bill, and upload the picture to a server of an insurance company, and the server performs OCR recognition on the picture to obtain content in the picture, and then performs calculation and payment of a claim amount according to the content in the picture.
However, in the process of taking a picture, in the process of transmitting the picture to the server side or in the process of compressing the picture before transmitting the picture to the server side, distortion of the picture may sometimes be caused.
Disclosure of Invention
The application discloses an image processing method and device.
In a first aspect, the present application shows a method of image processing, the method comprising:
receiving an image uploaded by a user;
acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;
when the quality is greater than or equal to a preset threshold value, performing OCR recognition on the image;
and when the quality is smaller than a preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.
In an optional implementation manner, the training manner of the image quality evaluation model includes:
acquiring a sample image and acquiring the labeling quality of the sample image;
constructing a network structure of an initialization model;
and training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain the image quality evaluation model.
In an optional implementation manner, the obtaining of the annotation quality of the sample image includes:
obtaining the accuracy rate of OCR recognition on the sample image;
acquiring the brightness of the sample image;
and acquiring the labeling quality of the sample image based on the brightness and the accuracy.
In an optional implementation manner, the obtaining an accuracy of OCR recognition on the sample image includes:
acquiring an annotation text in the sample image;
performing OCR recognition on the sample image to obtain a recognition text in the sample image;
and acquiring the accuracy rate based on the labeling text and the identification text.
In an optional implementation manner, the obtaining the accuracy based on the annotation text and the recognition text includes:
determining an accurate text recognized based on OCR in the recognized text based on the marked text;
and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.
In an optional implementation manner, the accuracy includes an accuracy of performing OCR recognition on a plurality of randomly selected fields in the sample image, and the brightness includes brightness of each field in the plurality of fields in the sample image;
the obtaining of the labeling quality of the sample image based on the brightness and the accuracy includes:
for any field in the fields, acquiring the labeling quality of the field according to the accuracy of OCR recognition on the field and the brightness of the field;
and calculating the average value of the labeling quality of each field to obtain the labeling quality of the sample image.
In an alternative implementation, the network structure includes at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;
the convolutional pooling layers comprise at least one convolutional layer and at least one pooling layer, the pooling layers comprise a max pooling layer; the convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix;
the up-sampling layer is used for increasing the dimension of the characteristic matrix;
the superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix;
the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value;
the logistic regression layer is used for converting the characteristic numerical value into the quality of an image.
In an alternative implementation, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer;
the up-sampling layer comprises a first up-sampling layer, a second up-sampling layer and a third up-sampling layer;
the superposition layers comprise a first superposition layer, a second superposition layer and a third superposition layer;
the input end of the first convolution pooling layer is used for inputting an image;
the output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer;
the output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer;
the output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer;
the output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer;
the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer;
the output end of the first upper sampling layer is connected with the input end of the first superposition layer;
the output end of the first superposition layer is connected with the input end of the second upper sampling layer;
the output end of the second upper sampling layer is connected with the input end of the second superposition layer;
the output end of the second superposition layer is connected with the input end of the third upper sampling layer;
the output end of the third up-sampling layer is connected with the input end of the third superposition layer;
the output end of the third superposed layer is connected with the input end of the full connection layer;
and the output end of the full connection layer is connected with the input end of the logistic regression layer.
In a second aspect, the present application shows an image processing apparatus, the apparatus further comprising:
the receiving module is used for receiving the image uploaded by the user;
the first acquisition module is used for acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;
the recognition module is used for performing OCR recognition on the image when the quality is greater than or equal to a preset threshold value;
and the output module is used for outputting prompt information when the quality is smaller than a preset threshold value, wherein the prompt information is used for prompting the user to input the image again.
In an optional implementation, the apparatus further comprises:
the first acquisition module is used for acquiring a sample image; the second acquisition module is used for acquiring the labeling quality of the sample image;
the building module is used for building a network structure of the initialization model;
and the training module is used for training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain the image quality evaluation model.
In an optional implementation manner, the second obtaining module includes:
the first acquisition unit is used for acquiring the accuracy of OCR recognition on the sample image;
a second acquisition unit configured to acquire brightness of the sample image;
and the third acquisition unit is used for acquiring the labeling quality of the sample image based on the brightness and the accuracy.
In an optional implementation manner, the first obtaining unit includes:
the first obtaining subunit is used for obtaining the annotation text in the sample image;
the identification subunit is used for performing OCR (optical character recognition) on the sample image to obtain an identification text in the sample image;
and the second acquiring subunit is used for acquiring the accuracy rate based on the labeling text and the identification text.
In an optional implementation manner, the second obtaining subunit is specifically configured to: determining an accurate text recognized based on OCR in the recognized text based on the marked text; and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.
In an optional implementation manner, the accuracy includes an accuracy of performing OCR recognition on a plurality of randomly selected fields in the sample image, and the brightness includes brightness of each field in the plurality of fields in the sample image;
the third acquisition unit includes:
a third obtaining subunit, configured to, for any one of the fields, obtain, according to the accuracy of OCR recognition on the field and the brightness of the field, a labeling quality of the field;
and the computing subunit is used for computing the average value of the labeling quality of each field to obtain the labeling quality of the sample image.
In an alternative implementation, the network structure includes at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;
the convolutional pooling layers comprise at least one convolutional layer and at least one pooling layer, the pooling layers comprise a max pooling layer; the convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix;
the up-sampling layer is used for increasing the dimension of the characteristic matrix;
the superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix;
the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value;
the logistic regression layer is used for converting the characteristic numerical value into the quality of an image.
In an alternative implementation, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer;
the up-sampling layer comprises a first up-sampling layer, a second up-sampling layer and a third up-sampling layer;
the superposition layers comprise a first superposition layer, a second superposition layer and a third superposition layer;
the input end of the first convolution pooling layer is used for inputting an image;
the output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer;
the output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer;
the output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer;
the output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer;
the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer;
the output end of the first upper sampling layer is connected with the input end of the first superposition layer;
the output end of the first superposition layer is connected with the input end of the second upper sampling layer;
the output end of the second upper sampling layer is connected with the input end of the second superposition layer;
the output end of the second superposition layer is connected with the input end of the third upper sampling layer;
the output end of the third up-sampling layer is connected with the input end of the third superposition layer;
the output end of the third superposed layer is connected with the input end of the full connection layer;
and the output end of the full connection layer is connected with the input end of the logistic regression layer.
In a third aspect, the present application shows an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method according to the first aspect when executing the program.
In a fourth aspect, the present application shows a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to the first aspect.
Compared with the prior art, the method has the following advantages:
the inventors have found that the accuracy of OCR on an image is proportional to the quality of the image. Thus, if the image is distorted, it may cause inaccuracy in the contents of the image recognized based on the OCR.
In order to avoid this, in one possible implementation, after the server receives the image uploaded by the user, the staff may manually check the quality of the image, and if the staff subjectively thinks that the quality of the image meets the requirement, the staff may indicate that the quality of the image meets the requirement to the server, so that the server may perform OCR recognition on the image later.
However, the inventors found that: in the implementation mode, the quality of the image needs to be checked manually by workers, so that the labor cost is high.
Secondly, in many cases, the number of images to be subjected to OCR recognition is large, which results in a large amount of review work for the quality of the images, and therefore, many workers are often arranged to perform manual review.
However, in the above method, there is no objective quality assessment standard, and the assessment of the image quality is greatly influenced by the subjective consciousness of the staff.
For example, some auditors consider that the accuracy of OCR recognition performed on an image with a quality meeting the requirement is low, while other auditors consider that the accuracy of OCR recognition performed on an image with a quality not meeting the requirement is high, which makes the result of manual audit be wrong easily.
In the application, an image uploaded by a user is received; acquiring the quality of the image based on the image quality evaluation model; judging whether the quality of the image is greater than or equal to a preset threshold value; when the quality of the image is greater than or equal to a preset threshold value, performing OCR recognition on the image; and when the quality of the image is less than the preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.
The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image.
According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality.
Therefore, in the application, in the process of acquiring the quality of the image by the electronic equipment according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.
Drawings
Fig. 1 is a flowchart of the steps of an image processing method of the present application.
FIG. 2 is a flow chart of the steps of a method of training an image quality assessment model of the present application.
FIG. 3 is a flowchart illustrating steps of a method for obtaining annotation quality of a sample image according to the present application.
Fig. 4 is a schematic diagram of a network structure of an image quality evaluation model of the present application.
Fig. 5 is a block diagram of an image processing apparatus according to the present application.
Fig. 6 is a block diagram of an electronic device shown in the present application.
Fig. 7 is a block diagram of an electronic device shown in the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Referring to fig. 1, a flowchart illustrating steps of an image processing method according to the present application is shown, where the method is applied to an electronic device, where the electronic device includes a server or a terminal, and the method may specifically include the following steps:
in step S101, an image uploaded by a user is received;
in the application, the image uploaded by the user can be an image obtained by photographing a certificate, a document, a bill or the like.
For example, in an insurance business scenario, in an insurance settlement phase, a user may upload an insurance document, a medical bill, and the like to a server of an insurance company, and in one aspect, the user may take a picture of the insurance document or the medical bill to obtain an image including the insurance document and the medical bill, and upload the image including the insurance document and the medical bill to the server of the insurance company, so that an insurance formula performs OCR recognition on the image to obtain the content of the insurance document, the medical bill, and the like in the image.
However, some images uploaded have low quality, so that the electronic device cannot accurately recognize the content in the images, and therefore, step S102 may be performed before performing OCR recognition on the images.
In step S102, the quality of the image is acquired based on the image quality evaluation model; the image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image;
in the present application, the image may be input to a trained image quality evaluation model to obtain the quality of the image output by the image quality evaluation model.
The training mode of the image quality evaluation model may refer to an embodiment shown in fig. 2, which will not be described in detail here.
In step S103, it is determined whether the quality of the image is greater than or equal to a preset threshold;
when the quality of the image is greater than or equal to the preset threshold, performing OCR recognition on the image in step S104;
when the quality of the image is less than the preset threshold, in step S105, a prompt message for prompting the user to re-input the image is output.
Until the quality of the image uploaded by the user is greater than or equal to a preset threshold.
Asynchronously, the user uploaded image may also be discarded when the quality of the image is less than a preset threshold.
The inventors have found that the accuracy of OCR on an image is proportional to the quality of the image. Thus, if the image is distorted, it may cause inaccuracy in the contents of the image recognized based on the OCR.
In order to avoid this, in one possible implementation, after the server receives the image uploaded by the user, the staff may manually check the quality of the image, and if the staff subjectively thinks that the quality of the image meets the requirement, the staff may indicate that the quality of the image meets the requirement to the server, so that the server may perform OCR recognition on the image later.
However, the inventors found that: in the implementation mode, the quality of the image needs to be checked manually by workers, so that the labor cost is high.
Secondly, in many cases, the number of images to be subjected to OCR recognition is large, which results in a large amount of review work for the quality of the images, and therefore, many workers are often arranged to perform manual review.
However, in the above method, there is no objective quality assessment standard, and the assessment of the image quality is greatly influenced by the subjective consciousness of the staff.
For example, some auditors consider that the accuracy of OCR recognition performed on an image with a quality meeting the requirement is low, while other auditors consider that the accuracy of OCR recognition performed on an image with a quality not meeting the requirement is high, which makes the result of manual audit be wrong easily.
In the application, an image uploaded by a user is received; acquiring the quality of the image based on the image quality evaluation model; judging whether the quality of the image is greater than or equal to a preset threshold value; when the quality of the image is greater than or equal to a preset threshold value, performing OCR recognition on the image; and when the quality of the image is less than the preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.
The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image.
According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality.
Therefore, in the application, in the process of acquiring the quality of the image by the electronic equipment according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.
In an alternative implementation, referring to fig. 2, the training mode of the image quality evaluation model includes:
in step S201, a sample image is obtained and the labeling quality of the sample image is obtained;
in the present application, the sample image may be manually collected by a technician or automatically collected by an electronic device on a network.
In one embodiment, in order to improve the performance of the trained image quality evaluation model and to simplify the image quality evaluation model on the premise of ensuring the accuracy of the quality of the image predicted by the image quality evaluation model, in the case of a plurality of sample images, the sample images may be unified into a sample image of a specific size. For example, the sample images are unified into 512 × 512 size, that is, each unified sample image includes 512 × 512 pixels.
In this way, after the image quality evaluation model is online, in the embodiment shown in fig. 1, after the image uploaded by the user is received in step S101 and before step S102, it may be determined whether the size of the image uploaded by the user is a specific size, and if the size of the image uploaded by the user is not the specific size, the image uploaded by the user may be converted into an image of the specific size, and step S102 may be performed on the converted image of the specific size.
The labeling quality of the sample image may be directly and manually labeled by a technician according to the standard definition of the image quality, or may be obtained in another way, which is specifically referred to as the embodiment shown in fig. 3 and will not be described in detail herein.
In step S202, a network structure of the initialization model is constructed;
the network structure comprises at least: specific structural examples of the convolution pooling layer, the sampling layer, the stacking layer, the full connection layer, the logistic regression layer, and the like can be referred to the embodiment shown in fig. 4 later, and are not described in detail here.
In step S203, the initialization model is trained based on the sample image and the annotation quality until the network parameters in the initialization model converge, so as to obtain an image quality evaluation model.
In another embodiment of the present application, referring to fig. 3, step S201 includes: the method comprises the following steps:
in step S301, obtaining an accuracy rate of performing OCR recognition on the sample image;
in an embodiment of the present application, the step may be implemented by the following process, including:
3011. acquiring an annotation text in a sample image;
in the application, a technician can manually identify the text in the sample text to obtain the text in the sample text, and the text is uploaded to the electronic device as the labeled text in the sample image, and the electronic device receives the labeled text uploaded by the technician in the sample image.
3012. Performing OCR recognition on the sample image to obtain a recognition text in the sample image;
3013. and acquiring the accuracy of OCR recognition on the sample image based on the labeling text and the recognition text.
In the application, all recognition texts obtained after OCR recognition is carried out on a sample image by the electronic equipment may be correct, and a recognition error condition may exist, so that an accurate text recognized based on OCR can be determined in the recognition texts based on the labeled text; for example, the annotated text is compared to the recognized text, such that the exact text identified in the recognized text that is determined to be based on OCR can be determined. Then, the ratio of the number of the recognized accurate texts to the number of the labeled texts can be calculated, so that the accuracy of OCR recognition on the sample image is obtained.
In step S302, the brightness of the sample image is acquired;
in step S303, the labeling quality of the sample image is obtained based on the brightness of the sample image and the accuracy of OCR recognition on the sample image.
In one embodiment of the present application, a first product between an accuracy rate of OCR recognition on a sample image and a first preset weight coefficient may be calculated, then a second product between a brightness of the sample image and a second preset weight coefficient may be calculated, and then a sum of the first product and the second product may be calculated, so as to obtain an annotation quality of the sample image.
In one example, the sum of the first preset weight coefficient and the second preset weight coefficient may be a specific value, for example, a value of 1, the first preset weight coefficient may be 0.6, etc., and the second preset weight coefficient may be 0.4, etc.
The possible brightness of the image is located in a section, for example, (0, X), etc., and thus the brightness of the acquired sample image is located in the section (0, X). X is greater than 0.
Since the accuracy of OCR recognition of the sample image obtained in step S301 is located between the intervals (0, 1), and when acquiring the annotation quality of the sample image, in the above-described manner, the accuracy of OCR recognition of the sample image and the brightness of the sample image need to be combined.
Therefore, if the brightness of the sample image is large, the calculated result of the labeling quality of the sample image may highlight the effect of the brightness of the sample image, and the effect of the accuracy rate of performing OCR recognition on the sample image is reduced, which may result in low accuracy of the labeling quality of the acquired sample image.
Therefore, in order to improve the accuracy of the labeling quality of the acquired sample image, in another embodiment of the present application, the brightness of the acquired sample image needs to be normalized to be within the interval (0, 1).
Specifically, the luminance of the sample image may be divided by the right end point X of the interval (0, X) to be normalized to within the interval (0, 1). And then acquiring the labeling quality of the sample image based on the accuracy of OCR recognition on the sample image and the brightness of the normalized sample image.
In another embodiment of the present application, in consideration that the quality of different regions in the image may be unbalanced, a plurality of fields may be selected immediately in the sample image, where the fields include, but are not limited to, a region in which a piece of text is located, a region in which a line of text is located, a region in which a piece of text is located, or a region in another form.
As such, the accuracy of OCR recognition of the sample image may include the accuracy of OCR recognition of randomly selected fields in the sample image, and the brightness includes the brightness of each of the fields in the sample image.
In this way, when the labeling quality of the sample image is obtained based on the brightness and the accuracy, for any one of the fields, the labeling quality of the field can be obtained according to the accuracy of OCR recognition on the field and the brightness of the field; in one embodiment of the present application, a first product between an accuracy rate of OCR recognition of the field and a first preset weight coefficient may be calculated, then a second product between a brightness of the field and a second preset weight coefficient may be calculated, and then a sum of the first product and the second product may be calculated, so as to obtain an annotation quality of the field. The same is true for each of the others of the plurality of fields.
In one example, the sum of the first preset weight coefficient and the second preset weight coefficient may be a specific value, for example, a value of 1, the first preset weight coefficient may be 0.6, etc., and the second preset weight coefficient may be 0.4, etc.
Then, an average value of the labeling quality of each field can be calculated to obtain the labeling quality of the sample image.
In one embodiment of the present application, initializing the network structure of the model includes at least: the device comprises a plurality of convolution pooling layers, a plurality of upsampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer.
The convolution pooling layers include at least one convolution layer and at least one pooling layer, the pooling layers including a maximum pooling layer. The convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix.
The upsampling layer is used for upscaling the feature matrix.
The superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix.
And the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value.
The logistic regression layer is used to convert the feature values into the quality of the image.
Referring to fig. 4, in one example, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer.
The upsampling layer includes a first upsampling layer, a second upsampling layer, and a third upsampling layer.
The superposition layer comprises a first superposition layer, a second superposition layer and a third superposition layer.
The input of the first convolution pooling layer is used for inputting images.
The output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer.
The output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer.
The output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer.
The output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer.
And the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer.
The output end of the first up-sampling layer is connected with the input end of the first superposition layer.
The output end of the first superposition layer is connected with the input end of the second upper sampling layer.
And the output end of the second upper sampling layer is connected with the input end of the second superposition layer.
And the output end of the second superposition layer is connected with the input end of the third upper sampling layer.
And the output end of the third up-sampling layer is connected with the input end of the third superposition layer.
And the output end of the third superposed layer is connected with the input end of the full connecting layer.
And the output end of the full connection layer is connected with the input end of the logistic regression layer.
The first convolution pooling layer comprises 1 convolution layer and 1 pooling layer, the input end of the convolution layer is the input end of the first convolution pooling layer, the output end of the convolution layer is connected with the input end of the pooling layer, and the output end of the pooling layer is the output end of the first convolution pooling layer. The convolutional layer may include 64 3 x 3 convolutional kernels, and the pooling layer includes a maximum pooling layer (maxporoling).
The second convolution pooling layer comprises 2 convolution layers and 1 pooling layer, wherein the 2 convolution layers are convolution layer 1 and convolution layer 2 respectively, the input end of convolution layer 1 is the input end of the second convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of the pooling layer, and the output end of the pooling layer is the output end of the second convolution pooling layer. Convolutional layers 1 and 2 may each include 128 3 x 3 convolutional kernels, with the pooled layers including the maximum pooled layer (maxporoling).
The third convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the third convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the third convolution pooling layer. Convolutional layers 1 and 2 may each include 256 3 × 3 convolutional kernels, convolutional layer 3 may include 256 1 × 1 convolutional kernels, and the pooling layer includes a maximum pooling layer (maxporoling).
The fourth convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the fourth convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the fourth convolution pooling layer. Convolutional layers 1 and 2 may each include 512 3 × 3 convolution kernels, convolutional layer 3 may include 512 1 × 1 convolution kernels, and the pooling layers include maximum pooling layers (maxporoling).
The fifth convolution pooling layer comprises 3 convolution layers and 1 pooling layer, wherein the 3 convolution layers are convolution layer 1, convolution layer 2 and convolution layer 3 respectively, the input end of convolution layer 1 is the input end of the fifth convolution pooling layer, the output end of convolution layer 1 is connected with the input end of convolution layer 2, the output end of convolution layer 2 is connected with the input end of convolution layer 3, the output end of convolution layer 3 is connected with the input end of pooling layer, and the output end of pooling layer is the output end of the fifth convolution pooling layer. Convolutional layers 1 and 2 may each include 512 3 × 3 convolution kernels, convolutional layer 3 may include 512 1 × 1 convolution kernels, and the pooling layers include maximum pooling layers (maxporoling).
In one example, the image is input to a first convolution pooling layer and processed by the first convolution pooling layer, the first convolution pooling layer outputs a feature matrix of 512 x 64, the second convolution pooling layer outputs a feature matrix of 256 x 128, the third convolution pooling layer outputs a feature matrix of 128 x 256, the fourth convolution pooling layer outputs a feature matrix of 64 x 512, the fifth convolution pooling layer outputs a feature matrix of 32 x 512,
the 32 × 512 feature matrix output by the fifth convolution pooling layer is processed by the first upsampling layer to obtain a 64 × 512 feature matrix, specifically, the first upsampling layer may insert a numerical value into the 32 × 512 feature matrix by means of mean interpolation to obtain a 64 × 512 feature matrix, which is not described in detail, and the second upsampling layer and the third upsampling layer process feature data in the same way, which is not described in detail later. The first upsampling layer then inputs the resulting feature matrix of 64 x 512 into the first superposition layer.
The first superposition layer superposes the feature matrix of 64 × 512 output by the first upsampling layer with the feature matrix of 64 × 512 output by the fourth convolution pooling layer, for example, by adding the values at the same positions, etc., to obtain a feature matrix of 64 × 512, which is input to the second upsampling layer.
The second upsampling layer processes the feature matrix of 64 x 512 input by the first overlay layer to obtain a feature matrix of 128 x 256, and then inputs the feature matrix of 128 x 256 obtained by the second upsampling layer to the second overlay layer.
The second superposition layer superposes the 128 × 256 feature matrix output by the second upsampling layer with the 128 × 256 feature matrix output by the third convolution pooling layer, for example, by adding the values at the same positions, etc., to obtain a 128 × 256 feature matrix, which is input to the third upsampling layer.
The third upsampling layer processes the 128 x 256 feature matrix input by the second overlay layer to obtain 256 x 128 feature matrix, and then inputs the 256 x 128 feature matrix obtained by the third upsampling layer to the third overlay layer.
The third overlay layer overlays the 256 × 128 feature matrix output by the third upsampling layer with the 256 × 128 feature matrix output by the second convolution pooling layer, e.g., by adding the values at the same locations, etc., to obtain a 256 × 128 feature matrix, which is input to the fully connected layer.
The fully connected layers convert the 256 × 128 feature matrix into a feature value, for example, all feature values in the 256 × 128 feature matrix are added to obtain a feature value, and then the obtained feature value is input into the logistic regression layer.
The logistic regression layer comprises a sigmoid function, and can convert the characteristic numerical value output by the full connection layer into a numerical value between intervals (0, 1) based on the sigmoid function and serve as the quality of the image.
It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary and that no action is necessarily required in this application.
Referring to fig. 5, a block diagram of an image processing apparatus according to the present application is shown, and the apparatus may specifically include the following modules:
the receiving module 11 is used for receiving an image uploaded by a user;
a first obtaining module 12, configured to obtain quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;
the recognition module 13 is configured to perform OCR recognition on the image when the quality is greater than or equal to a preset threshold;
and the output module 14 is configured to output prompt information when the quality is smaller than a preset threshold, where the prompt information is used to prompt the user to re-input an image.
In an optional implementation, the apparatus further comprises:
the first acquisition module is used for acquiring a sample image; the second acquisition module is used for acquiring the labeling quality of the sample image;
the building module is used for building a network structure of the initialization model;
and the training module is used for training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain the image quality evaluation model.
In an optional implementation manner, the second obtaining module includes:
the first acquisition unit is used for acquiring the accuracy of OCR recognition on the sample image;
a second acquisition unit configured to acquire brightness of the sample image;
and the third acquisition unit is used for acquiring the labeling quality of the sample image based on the brightness and the accuracy.
In an optional implementation manner, the first obtaining unit includes:
the first obtaining subunit is used for obtaining the annotation text in the sample image;
the identification subunit is used for performing OCR (optical character recognition) on the sample image to obtain an identification text in the sample image;
and the second acquiring subunit is used for acquiring the accuracy rate based on the labeling text and the identification text.
In an optional implementation manner, the second obtaining subunit is specifically configured to: determining an accurate text recognized based on OCR in the recognized text based on the marked text; and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.
In an optional implementation manner, the accuracy includes an accuracy of performing OCR recognition on a plurality of randomly selected fields in the sample image, and the brightness includes brightness of each field in the plurality of fields in the sample image;
the third acquisition unit includes:
a third obtaining subunit, configured to, for any one of the fields, obtain, according to the accuracy of OCR recognition on the field and the brightness of the field, a labeling quality of the field;
and the computing subunit is used for computing the average value of the labeling quality of each field to obtain the labeling quality of the sample image.
In an alternative implementation, the network structure includes at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;
the convolutional pooling layers comprise at least one convolutional layer and at least one pooling layer, the pooling layers comprise a max pooling layer; the convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix;
the up-sampling layer is used for increasing the dimension of the characteristic matrix;
the superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix;
the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value;
the logistic regression layer is used for converting the characteristic numerical value into the quality of an image.
In an alternative implementation, the convolution pooling layers include a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer, and a fifth convolution pooling layer;
the up-sampling layer comprises a first up-sampling layer, a second up-sampling layer and a third up-sampling layer;
the superposition layers comprise a first superposition layer, a second superposition layer and a third superposition layer;
the input end of the first convolution pooling layer is used for inputting an image;
the output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer;
the output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer;
the output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer;
the output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer;
the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer;
the output end of the first upper sampling layer is connected with the input end of the first superposition layer;
the output end of the first superposition layer is connected with the input end of the second upper sampling layer;
the output end of the second upper sampling layer is connected with the input end of the second superposition layer;
the output end of the second superposition layer is connected with the input end of the third upper sampling layer;
the output end of the third up-sampling layer is connected with the input end of the third superposition layer;
the output end of the third superposed layer is connected with the input end of the full connection layer;
and the output end of the full connection layer is connected with the input end of the logistic regression layer.
The inventors have found that the accuracy of OCR on an image is proportional to the quality of the image. Thus, if the image is distorted, it may cause inaccuracy in the contents of the image recognized based on the OCR.
In order to avoid this, in one possible implementation, after the server receives the image uploaded by the user, the staff may manually check the quality of the image, and if the staff subjectively thinks that the quality of the image meets the requirement, the staff may indicate that the quality of the image meets the requirement to the server, so that the server may perform OCR recognition on the image later.
However, the inventors found that: in the implementation mode, the quality of the image needs to be checked manually by workers, so that the labor cost is high.
Secondly, in many cases, the number of images to be subjected to OCR recognition is large, which results in a large amount of review work for the quality of the images, and therefore, many workers are often arranged to perform manual review.
However, in the above method, there is no objective quality assessment standard, and the assessment of the image quality is greatly influenced by the subjective consciousness of the staff.
For example, some auditors consider that the accuracy of OCR recognition performed on an image with a quality meeting the requirement is low, while other auditors consider that the accuracy of OCR recognition performed on an image with a quality not meeting the requirement is high, which makes the result of manual audit be wrong easily.
In the application, an image uploaded by a user is received; acquiring the quality of the image based on the image quality evaluation model; judging whether the quality of the image is greater than or equal to a preset threshold value; when the quality of the image is greater than or equal to a preset threshold value, performing OCR recognition on the image; and when the quality of the image is less than the preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.
The image quality evaluation model is obtained by training the initialization model based on the sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained by the electronic equipment based on the accuracy rate of OCR recognition on the sample image and the brightness of the sample image.
According to the method and the device, the evaluation standard of the image quality can be unified according to the actual situation. And the evaluation standard of the image quality is obtained by analyzing according to the actual situation by technicians, and the image quality can be accurately determined according to the evaluation standard of the image quality.
Therefore, in the application, in the process of acquiring the quality of the image by the electronic equipment according to the image quality evaluation model, the evaluation standard of the quality of the image is used, so that the quality of the image is determined to be accurate and uniform, the error condition is reduced, and the process of acquiring the quality of the image does not need manual participation, so that the labor cost can be reduced.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Fig. 6 is a block diagram of an electronic device 800 shown in the present application. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 6, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast operation information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 7 is a block diagram of an electronic device 1900 shown in the present application. For example, the electronic device 1900 may be provided as a server.
Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The foregoing describes in detail an image processing method and apparatus provided by the present application, and specific examples are applied herein to explain the principles and embodiments of the present application, and the description of the foregoing examples is only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An image processing method, characterized in that the method further comprises:
receiving an image uploaded by a user;
acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;
when the quality is greater than or equal to a preset threshold value, performing OCR recognition on the image;
and when the quality is smaller than a preset threshold value, outputting prompt information, wherein the prompt information is used for prompting the user to input the image again.
2. The method of claim 1, wherein the image quality assessment model is trained by:
acquiring a sample image and acquiring the labeling quality of the sample image;
constructing a network structure of an initialization model;
training the initialization model based on the sample image and the labeling quality until network parameters in the initialization model are converged to obtain an image quality evaluation model;
the acquiring of the labeling quality of the sample image includes:
obtaining the accuracy rate of OCR recognition on the sample image;
acquiring the brightness of the sample image;
and acquiring the labeling quality of the sample image based on the brightness and the accuracy.
3. The method of claim 2, wherein obtaining an accuracy rate for OCR recognition of the sample image comprises:
acquiring an annotation text in the sample image;
performing OCR recognition on the sample image to obtain a recognition text in the sample image;
and acquiring the accuracy rate based on the labeling text and the identification text.
4. The method of claim 3, wherein the obtaining the accuracy based on the annotation text and the recognition text comprises:
determining an accurate text recognized based on OCR in the recognized text based on the marked text;
and calculating the ratio of the number of the identified accurate texts to the number of the labeled texts to obtain the accuracy.
5. The method of claim 2, wherein the accuracy comprises an accuracy of OCR recognition of a randomly selected plurality of fields in the sample image, respectively, and the brightness comprises a brightness of each of the plurality of fields in the sample image;
the obtaining of the labeling quality of the sample image based on the brightness and the accuracy includes:
for any field in the fields, acquiring the labeling quality of the field according to the accuracy of OCR recognition on the field and the brightness of the field;
and calculating the average value of the labeling quality of each field to obtain the labeling quality of the sample image.
6. The method according to claim 2, characterized in that said network structure comprises at least: the system comprises a plurality of convolution pooling layers, a plurality of upper sampling layers, a plurality of superposition layers, a full connection layer and a logistic regression layer;
the convolutional pooling layers comprise at least one convolutional layer and at least one pooling layer, the pooling layers comprise a max pooling layer; the convolution layer is used for acquiring a feature matrix of the image, and the pooling layer is used for reducing the dimension of the feature matrix;
the up-sampling layer is used for increasing the dimension of the characteristic matrix;
the superposition layer is used for superposing the characteristic matrix output by the convolution pooling layer and the characteristic matrix output by the upper sampling layer into a characteristic matrix;
the full connection layer is used for converting the characteristic matrix output by the superposition layer into a characteristic numerical value;
the logistic regression layer is used for converting the characteristic numerical value into the quality of an image.
7. The method of claim 6,
the convolution pooling layers comprise a first convolution pooling layer, a second convolution pooling layer, a third convolution pooling layer, a fourth convolution pooling layer and a fifth convolution pooling layer;
the up-sampling layer comprises a first up-sampling layer, a second up-sampling layer and a third up-sampling layer;
the superposition layers comprise a first superposition layer, a second superposition layer and a third superposition layer;
the input end of the first convolution pooling layer is used for inputting an image;
the output end of the first convolution pooling layer is connected with the input end of the second convolution pooling layer;
the output end of the second convolution pooling layer is connected with the input end of the third convolution pooling layer, and the output end of the second convolution pooling layer is also connected with the input end of the third superposition layer;
the output end of the third convolution pooling layer is connected with the input end of the fourth convolution pooling layer, and the output end of the third convolution pooling layer is also connected with the input end of the second superposition layer;
the output end of the fourth convolution pooling layer is connected with the input end of the fifth convolution pooling layer, and the output end of the fourth convolution pooling layer is also connected with the input end of the first superposition layer;
the output end of the fifth convolution pooling layer is connected with the input end of the first up-sampling layer;
the output end of the first upper sampling layer is connected with the input end of the first superposition layer;
the output end of the first superposition layer is connected with the input end of the second upper sampling layer;
the output end of the second upper sampling layer is connected with the input end of the second superposition layer;
the output end of the second superposition layer is connected with the input end of the third upper sampling layer;
the output end of the third up-sampling layer is connected with the input end of the third superposition layer;
the output end of the third superposed layer is connected with the input end of the full connection layer;
and the output end of the full connection layer is connected with the input end of the logistic regression layer.
8. An image processing apparatus, characterized in that the apparatus further comprises:
the receiving module is used for receiving the image uploaded by the user;
the first acquisition module is used for acquiring the quality of the image based on an image quality evaluation model; the image quality evaluation model is obtained by training an initialization model based on a sample image and the labeling quality of the sample image, and the labeling quality of the sample image is obtained based on the accuracy of OCR (optical character recognition) on the sample image and the brightness of the sample image;
the recognition module is used for performing OCR recognition on the image when the quality is greater than or equal to a preset threshold value;
and the output module is used for outputting prompt information when the quality is smaller than a preset threshold value, wherein the prompt information is used for prompting the user to input the image again.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the image processing method according to any of claims 1 to 7 are implemented by the processor when executing the program.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 7.
CN202010664624.9A 2020-07-10 2020-07-10 Image processing method and device Active CN111932500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010664624.9A CN111932500B (en) 2020-07-10 2020-07-10 Image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010664624.9A CN111932500B (en) 2020-07-10 2020-07-10 Image processing method and device

Publications (2)

Publication Number Publication Date
CN111932500A true CN111932500A (en) 2020-11-13
CN111932500B CN111932500B (en) 2023-10-13

Family

ID=73312303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010664624.9A Active CN111932500B (en) 2020-07-10 2020-07-10 Image processing method and device

Country Status (1)

Country Link
CN (1) CN111932500B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076334A (en) * 2013-01-25 2013-05-01 上海理工大学 Method for quantitatively evaluating perceived quality of digital printed lines and texts
CN107644415A (en) * 2017-09-08 2018-01-30 众安信息技术服务有限公司 A kind of text image method for evaluating quality and equipment
US20180121756A1 (en) * 2016-10-28 2018-05-03 Intuit Inc. Image quality assessment and improvement for performing optical character recognition
CN110020645A (en) * 2019-02-11 2019-07-16 阿里巴巴集团控股有限公司 A kind of image processing method and device, a kind of calculating equipment and storage medium
CN110298827A (en) * 2019-06-19 2019-10-01 桂林电子科技大学 A kind of picture quality recognition methods based on image procossing
CN111127452A (en) * 2019-12-27 2020-05-08 上海箱云物流科技有限公司 Container intelligent OCR recognition method based on cloud processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076334A (en) * 2013-01-25 2013-05-01 上海理工大学 Method for quantitatively evaluating perceived quality of digital printed lines and texts
US20180121756A1 (en) * 2016-10-28 2018-05-03 Intuit Inc. Image quality assessment and improvement for performing optical character recognition
CN107644415A (en) * 2017-09-08 2018-01-30 众安信息技术服务有限公司 A kind of text image method for evaluating quality and equipment
CN110020645A (en) * 2019-02-11 2019-07-16 阿里巴巴集团控股有限公司 A kind of image processing method and device, a kind of calculating equipment and storage medium
CN110298827A (en) * 2019-06-19 2019-10-01 桂林电子科技大学 A kind of picture quality recognition methods based on image procossing
CN111127452A (en) * 2019-12-27 2020-05-08 上海箱云物流科技有限公司 Container intelligent OCR recognition method based on cloud processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周景超 等: "图像质量评价研究综述", 《计算机科学》 *

Also Published As

Publication number Publication date
CN111932500B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN110647834B (en) Human face and human hand correlation detection method and device, electronic equipment and storage medium
US10452890B2 (en) Fingerprint template input method, device and medium
CN107944447B (en) Image classification method and device
CN109543066B (en) Video recommendation method and device and computer-readable storage medium
EP3176709A1 (en) Video categorization method and apparatus, computer program and recording medium
CN111461304B (en) Training method of classified neural network, text classification method, device and equipment
CN109670077B (en) Video recommendation method and device and computer-readable storage medium
CN109255128B (en) Multi-level label generation method, device and storage medium
CN108009563B (en) Image processing method and device and terminal
CN109543069B (en) Video recommendation method and device and computer-readable storage medium
CN110717399A (en) Face recognition method and electronic terminal equipment
CN107493366B (en) Address book information updating method and device and storage medium
CN108984628B (en) Loss value obtaining method and device of content description generation model
CN112948704A (en) Model training method and device for information recommendation, electronic equipment and medium
CN107992894B (en) Image recognition method, image recognition device and computer-readable storage medium
CN110826463B (en) Face recognition method and device, electronic equipment and storage medium
CN111797746A (en) Face recognition method and device and computer readable storage medium
CN111932500B (en) Image processing method and device
CN111667827B (en) Voice control method and device for application program and storage medium
CN113807540A (en) Data processing method and device
CN109711386B (en) Method and device for obtaining recognition model, electronic equipment and storage medium
CN112132762A (en) Data processing method and device and recording equipment
CN115225702B (en) Information pushing method and device, electronic equipment and storage medium
CN111428806B (en) Image tag determining method and device, electronic equipment and storage medium
CN113190725B (en) Object recommendation and model training method and device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant