CN111753873A

CN111753873A - Image detection method and device

Info

Publication number: CN111753873A
Application number: CN202010398974.5A
Authority: CN
Inventors: 高大帅; 李健; 武卫东
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2020-10-09

Abstract

The invention provides an image detection method and device, and relates to the field of image recognition. According to the image detection method and device provided by the invention, the target image is input into a first model, the first score of the target image is output, the target image is input into a second model, the second score of the target image is output, and finally the first score and the second score are subjected to weighted fusion to obtain the definition score of the target image. The first model is obtained by training image samples and the labeling information corresponding to the image samples based on a no-reference image spatial quality assessment algorithm, and the second model is obtained by training the image samples and the labeling information corresponding to the image samples based on a lightweight target classification algorithm.

Description

Image detection method and device

Technical Field

The invention relates to the technical field of computers, in particular to an image detection method and device.

Background

The text image is an image containing characters, and the definition of the text image has a crucial influence on an OCR (Optical character recognition) recognition rate, so that the definition of the text image needs to be detected before OCR recognition is performed on the text image.

At present, the following two schemes are mainly adopted to detect the definition of a text image:

one method is to detect the definition of a text image according to gradient information in the text image, but the method has no inhibition capability on background noise of the text image, so that the score obtained by the method is greatly different from the artificial subjective score.

The other method is to adopt a secondary fuzzy algorithm to detect the definition of the text image, the principle of the secondary fuzzy algorithm is to add the same Gaussian blur to the original image, calculate the similarity between the image added with the Gaussian blur and the original image, and determine the definition of the original image according to the similarity.

Disclosure of Invention

In view of the above, the present invention has been made to provide an image detection method and apparatus that overcome or at least partially solve the above problems.

According to a first aspect of the present invention, there is provided an image detection method, the method comprising:

acquiring a target image to be detected;

inputting the target image into a first model, and outputting a first score of the target image through the first model, wherein the first model is obtained by training an image sample and annotation information corresponding to the image sample based on a no-reference image space quality assessment algorithm;

inputting the target image into a second model, and outputting a second score of the target image through the second model, wherein the second model is obtained by training the image sample and the labeling information corresponding to the image sample based on a lightweight target classification algorithm;

and carrying out weighted fusion on the first score and the second score to obtain the definition score of the target image.

According to a second aspect of the present invention, there is provided an image detection apparatus comprising:

the first acquisition module is used for acquiring a target image to be detected;

the first output module is used for inputting the target image into a first model and outputting a first score of the target image through the first model, wherein the first model is obtained by training an image sample and annotation information corresponding to the image sample based on a no-reference image spatial quality assessment algorithm;

the second output module is used for inputting the target image into a second model and outputting a second score of the target image through the second model, and the second model is obtained by training the image sample and the labeling information corresponding to the image sample based on a lightweight target classification algorithm;

and the weighted fusion module is used for carrying out weighted fusion on the first score and the second score to obtain the definition score of the target image.

According to the image detection method and device provided by the embodiment of the invention, the target image is input into the first model, the first score of the target image is output, the target image is input into the second model, the second score of the target image is output, and finally the first score and the second score are subjected to weighted fusion to obtain the definition score of the target image. The first model is obtained by training image samples and the labeling information corresponding to the image samples based on a no-reference image space quality evaluation algorithm, and the second model is obtained by training the image samples and the labeling information corresponding to the image samples based on a lightweight target classification algorithm. The first score obtained through the first model is more consistent with the consistency of the artificial subjective scores, the second score obtained through the second model can more accurately represent the definition of the target image, and the definition score of the target image is obtained by performing weighted fusion on the first score and the second score.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating steps of an image detection method according to an embodiment of the present invention;

fig. 2 is a block diagram of an image detection apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Method embodiment

In order to solve the problems that scores obtained by image detection in the prior art are greatly different from artificial subjective scores and the application range of an image detection scheme in the prior art is greatly limited, the invention provides an image detection method to overcome the problems.

Fig. 1 is a flowchart illustrating steps of an image detection method according to an embodiment of the present invention, where as shown in fig. 1, the method may include:

step 101, acquiring a target image to be detected;

102, inputting the target image into a first model, and outputting a first score of the target image through the first model, wherein the first model is obtained by training an image sample and annotation information corresponding to the image sample based on a no-reference image spatial quality assessment algorithm;

103, inputting the target image into a second model, and outputting a second score of the target image through the second model, wherein the second model is obtained by training the image sample and the labeling information corresponding to the image sample based on a lightweight target classification algorithm;

and 104, carrying out weighted fusion on the first score and the second score to obtain a definition score of the target image.

The image in the embodiment of the present invention includes, but is not limited to, a person image, a landscape image, a text image, and the like. For convenience of description, the images in the embodiments of the present invention are text images as examples, and it is understood that the embodiments of the present invention do not limit the categories of the images. The text image refers to an image containing text information, such as: identification cards, bank cards, driving licenses, drivers' licenses, passports, business licenses, and the like.

The image sample refers to an image collected by a person skilled in the art and used for training a model, and the annotation information pointer is used for artificially and subjectively evaluating the definition of the image. The image samples and the labeling information are in one-to-one correspondence.

The image definition refers to the definition of each detail shadow and the boundary thereof on the image. The definition of the image has a crucial influence on the subsequent processing of the image, for example, when performing OCR recognition on a text image, the text image meeting the definition requirement can be directly sent to a subsequent OCR recognition engine for recognition, and the text image not meeting the definition requirement is rejected for subsequent recognition flow, so that before performing OCR recognition on the text image, the definition of the text image needs to be detected to improve the accuracy, reliability and efficiency of OCR recognition.

The image detection means that the image quality, specifically the definition of the image, is evaluated, and the definition of the image is determined through the image detection. The method for evaluating the image quality generally comprises subjective evaluation and objective evaluation, wherein the subjective evaluation refers to evaluation performed manually by taking a person as an observer, and the subjective evaluation can truly and accurately reflect the definition of the image; the objective evaluation means that an evaluation result is obtained based on mathematical calculation by means of a certain mathematical model so as to simulate subjective evaluation. It is specified internationally that evaluation of an image is divided into 5 categories, that is, the subjective evaluation and the objective evaluation are both divided into 1 to 5 categories, that is, the sharpness of the image includes 1 to 5, five categories.

Specifically, NIMA (Neural Image Assessment algorithm) is divided into a full-reference Image spatial quality Assessment algorithm and a no-reference Image spatial quality Assessment algorithm. The full-reference image spatial quality evaluation algorithm refers to evaluating the quality of a target image by using measurement methods such as PSNR (Peak Signal to Noise Ratio), SSIM (structural similarity index) and the like according to a reference image with very good quality, wherein the PSNR is a method for measuring the similarity of two images, and the SSIM is an index for measuring the similarity of the two images, but no reference image is usually used in practical application, so that the full-reference image spatial quality evaluation algorithm based on the PSNR and the SSIM cannot be normally realized.

The no-reference image spatial quality evaluation algorithm refers to an image quality evaluation algorithm without a reference image. The first model is obtained by training Image samples and annotation information corresponding to the Image samples based on a BRISQUE (blank/referencecell Image Spatial QUality Evaluator) algorithm. The objective evaluation provided by the first model trained based on the BRISQUE algorithm can approach the artificial subjective evaluation, so that the definition of the target image can be reflected more truly by the first score output by the first model, and the accuracy of image detection is improved.

The second model is obtained by training the image sample and the labeling information corresponding to the image sample based on a lightweight target classification algorithm. In an example, the lightweight target classification algorithm may include a MobileNets neural network algorithm, such as MobileNetv3, where the MobileNetv3 neural network algorithm has characteristics of small model, fast speed, and good effect, and therefore, the efficiency of outputting the second score by performing image detection on the second model can be improved by using the lightweight target classification algorithm. In one example, the lightweight target classification algorithm may include a loss function, where the loss function may be an EMD (Earth move's Distance) loss function, and the EMD loss function may analyze a relationship between the image sample classes, so that training a second model based on the land move Distance may punish the image sample with a large classification error during the training of the image sample and the label information corresponding to the image sample, so as to provide a strong robustness for the second model and improve the accuracy of a second score output by the second model, for example, an image sample with an artificial subjective score of 5 is classified into 1 score by the neural network algorithm, an image sample with an artificial subjective score of 5 is classified into 4 scores by the neural network algorithm, and an image sample with an artificial subjective score of 5 is punished by the preset neural network algorithm for an image sample with a classification of 1 score of 5, therefore, stronger robustness is provided for the second model, and the accuracy of the second score output by the second model is improved. Wherein robustness refers to the ability of a computer system or algorithm to handle errors during execution and to continue normal operation when encountering input, operational, etc. anomalies.

The target image is respectively input into the trained first model and the trained second model, and the first score output by the first model and the second score output by the second model are weighted and fused according to the preset weight, so that more accurate definition score aiming at the definition of the target image can be obtained. The definition score is a composite score obtained based on the first score and the second score.

In an optional embodiment of the present invention, before the acquiring, in step 101, a target image to be detected, the method further includes:

step S11, acquiring an image sample;

and step S12, labeling the image sample according to the definition of the image sample to obtain labeling information corresponding to the image sample.

Specifically, in order to train the first model and the second model, a certain amount of image samples need to be acquired in advance, for example, 50000 certificate class text images are acquired in advance. The step of marking the image samples refers to manually and subjectively scoring the image samples, for example, manually and subjectively scoring the 50000 certificate text images, scoring each image sample by ten persons, obtaining a final score for each image sample in a mode of taking a mode from scoring results of the ten persons, and taking the final score as marking information of the image samples. It should be noted that the image samples and the labeling information are in a one-to-one correspondence relationship.

Optionally, after step S12, the method further includes: normalizing the parameters of the image sample to preset parameters.

In order to make the parameters of the pattern samples uniform, so as to facilitate the training of the first model and the second model, the embodiment of the present invention needs to normalize the parameters of the image samples, including but not limited to the size, dimension, format, color type, etc. of the image samples, to preset parameters before training the first model and the second model, for example, normalizing the size and color type of the image samples to a color map of 320 × 3. The values and kinds of the preset parameters are set by those skilled in the art according to the needs, and the invention is not limited.

In an alternative embodiment of the invention, the first model is trained by:

step S21, extracting the statistical characteristics of the image sample;

step S22, inputting the statistical characteristics of the image samples and the labeling information corresponding to the image samples into a support vector machine to train and obtain the first model.

Specifically, the first model is obtained by training an image sample and annotation information corresponding to the image sample based on a BRISQUE algorithm. The BRISQUE algorithm includes the steps of: and extracting statistical characteristics from the image sample, and inputting the statistical characteristics of the image sample and the labeling information corresponding to the image sample into a support vector machine. The objective score with high consistency with the artificial subjective score can be output through the first model obtained by the BRISQUE algorithm training.

The statistical feature refers to a certain feature or certain features related among the individuals forming the population, and in the embodiment of the invention, refers to a certain feature or certain features having relevance in all the image samples. The Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner. Inputting the extracted statistical characteristics and the labeling information corresponding to the image sample into a support vector machine for regression analysis so as to determine the interdependent quantitative relationship between the statistical characteristics and the labeling information, and training to obtain a first model.

Optionally, the extracting the statistical features of the image sample in step S21 includes:

step S211, extracting a mean contrast normalization coefficient from the image sample;

step S212, fitting the mean value removing contrast normalization coefficient into asymmetric generalized Gaussian distribution;

and step S213, extracting the statistical characteristics of the image sample from the asymmetric generalized Gaussian distribution.

The mean filtered normalized (MSCN) coefficient refers to the value of an image pixel value minus the local mean divided by the local variance. And fitting the mean-removing contrast normalization coefficient into Asymmetric Generalized Gaussian Distribution (AGGD) to extract shape parameters of the asymmetric generalized Gaussian distribution as statistical characteristics of the image sample. The fitting refers to connecting the de-averaged contrast normalization coefficients with a smooth curve.

In an optional embodiment of the present invention, before the extracting the statistical features of the image sample in step S21, the method further includes: a target region is determined in the target image.

Optionally, the target image includes text information, and determining the target area in the target image includes: and determining the area where the text information is located in the target image as a target area.

In order to reduce the negative influence of the background area on the image detection, in the embodiment of the present invention, before the image detection is performed, a target area may be further determined in the target image, for example, before the business card image detection is performed, an area in the business card image, where a name, identity information, and a contact phone are written, may be determined as the target area, so as to improve the efficiency and accuracy of the image detection.

Optionally, the step S211 of extracting a mean-removed contrast normalization coefficient from the image sample includes: and extracting the horizontal, vertical, left diagonal and right diagonal de-mean contrast normalization coefficients of the target area.

Specifically, the relation between adjacent pixels of the target area can be accurately obtained by extracting the mean-removing contrast normalization coefficients of the horizontal, vertical, left diagonal and right diagonal of the target area.

Optionally, the step S213 of extracting the statistical features of the image sample from the asymmetric generalized gaussian distribution includes: and extracting 16-dimensional statistical characteristics of the image sample from the asymmetric generalized Gaussian distribution.

Specifically, the mean-removing contrast normalization coefficients extracted according to the horizontal, vertical, left diagonal and right diagonal of the target area are fitted to obtain asymmetric generalized gaussian distributions in four directions, and four parameters can be respectively extracted from the asymmetric generalized gaussian distributions in each direction: the shape parameter, the mean parameter, the left difference parameter, and the right difference parameter, so that the 16-dimensional statistical features of the image sample can be finally extracted.

In an optional embodiment of the invention, the lightweight target classification algorithm comprises a MobileNets neural network algorithm; training to obtain the second model by the following steps:

and training the second model according to a MobileNet neural network algorithm and a land movement distance based on the image sample.

The MobileNet is a neural network algorithm which is based on a streamline framework and uses deep separable convolution to construct a lightweight deep neural network, and the neural network structure can be simplified. For example, MobileNetv3, the MobileNetv3 neural network algorithm has the characteristics of small model, high speed and good effect, so that the efficiency of image detection and second score output by the second model can be improved by adopting a lightweight target classification algorithm.

The land movement distance is used as a loss function, and the relationship among the image sample categories can be analyzed, so that the second model can be trained based on the land movement distance, and the image samples with larger classification errors can be effectively punished, so that stronger robustness is provided for the second model, and the accuracy of the second score output by the second model is improved.

In one example, the second model is trained based on the Tensorflow framework selecting the MobileNetv3 neural network algorithm and land movement distance. The Tensorflow framework is a system that is capable of transmitting complex data into a neural network for analysis and processing, i.e., the image samples into a MobileNetv3 neural network.

In one example, the EMD loss (loss) function may be approximated using a cumulative probability density function to distinguish the relationship between classes. And the lightweight target classification algorithm may include an optimizer, and the optimizer may adopt an Adam optimizer, wherein parameters of the optimizer include, but are not limited to: initializing a learning rate, a number of training rounds, a learning rate degradation value, and the like. For example, the parameters may be set as: the initial learning rate is 0.002, the number of training rounds is 50, and the learning rate drop value of each 10 rounds is 0.4, it should be noted that the setting of the optimizer parameters is not specifically limited by the present invention.

In an optional embodiment of the present invention, the step 104 of performing weighted fusion on the first score and the second score to obtain a sharpness score of the target image includes:

and inputting the first score and the second score into a third model, and outputting a definition score of the target image through the third model, wherein the third model is used for carrying out weighted calculation on the first score and the second score.

The third model is a model with a preset weight, and the value of the preset weight is set according to the business needs of those skilled in the art, which is not limited in the present invention. The third model can perform weighted calculation on the first score and the second score according to preset weights, so that the first score and the second score are subjected to weighted fusion to obtain the definition score of the target image.

The first model is obtained based on a BRISQUE algorithm, and a first score output by the first model can be called a BRISQUE score; the second model is obtained by training based on a lightweight target classification algorithm, wherein the lightweight target classification algorithm may include a MobileNets neural network algorithm, and a second score output by the second model may also be referred to as a neural network score. For example, one skilled in the art may set the preset weight to be 0.4, that is, the briske score ratio is 4, and the neural network score ratio is 6, so as to perform weighted fusion on the briske score and the neural network score according to the preset weight to obtain the sharpness score of the target image.

In summary, in the image detection method provided in the embodiment of the present invention, the target image is input into the first model, the first score of the target image is output, the target image is input into the second model, the second score of the target image is output, and finally the first score and the second score are weighted and fused to obtain the sharpness score of the target image.

The first model is obtained by training image samples and labeling information corresponding to the image samples based on a no-reference image spatial quality assessment algorithm, and a first score output by the first model can better accord with the consistency of artificial subjective scores and more truly embody the definition of the target image; the second model is obtained by training the image samples and the labeling information corresponding to the image samples based on a lightweight target classification algorithm, and the second model can punish the image samples with larger classification errors so as to provide stronger robustness and ensure the accuracy of image detection. Therefore, the invention can improve the definition accuracy of the detected target image and enable the application range of detection to be wider.

Device embodiment

Fig. 2 is a block diagram of an image detection apparatus according to an embodiment of the present invention, and as shown in fig. 2, the apparatus may include:

the first obtaining module 201 is configured to obtain a target image to be detected.

The first output module 202 is configured to input the target image into a first model, and output a first score of the target image through the first model, where the first model is obtained by training an image sample and annotation information corresponding to the image sample based on a non-reference image spatial quality assessment algorithm.

The second output module 203 is configured to input the target image into a second model, and output a second score of the target image through the second model, where the second model is obtained by training the image sample and the labeling information corresponding to the image sample based on a lightweight target classification algorithm.

And the weighted fusion module 204 is configured to perform weighted fusion on the first score and the second score to obtain a sharpness score of the target image.

Optionally, the apparatus further comprises:

and the second acquisition module is used for acquiring the image sample.

And the marking module is used for marking the image sample according to the definition of the image sample to obtain marking information corresponding to the image sample.

Optionally, the apparatus further comprises: and the training module is used for training to obtain the first model.

Optionally, the training module includes:

and the characteristic extraction submodule is used for extracting the statistical characteristics of the image sample.

And the model training submodule is used for inputting the statistical characteristics of the image samples and the marking information corresponding to the image samples into a support vector machine so as to train to obtain the first model.

Optionally, the feature extraction sub-module includes:

and the extraction coefficient unit is used for extracting the mean-removing contrast normalization coefficient from the image sample.

And the fitting unit is used for fitting the mean-removing contrast normalization coefficient into asymmetric generalized Gaussian distribution.

And the characteristic extraction unit is used for extracting the statistical characteristics of the image samples from the asymmetric generalized Gaussian distribution.

Optionally, the apparatus further comprises:

and the region determining module is used for determining a target region in the target image.

Optionally, the coefficient extracting unit includes:

and the extraction subunit is used for extracting the horizontal, vertical, left diagonal and right diagonal de-mean contrast normalization coefficients of the target area.

Optionally, the target image includes text information.

Optionally, the area determining module includes:

and the area determining submodule is used for determining the area where the text information is located in the target image as a target area.

Optionally, the lightweight target classification algorithm includes a MobileNets neural network algorithm.

Optionally, the weighted fusion module 204 includes:

and the weighted fusion submodule is used for inputting the first score and the second score into a third model, outputting the definition score of the target image through the third model, and carrying out weighted calculation on the first score and the second score through the third model.

In summary, in the image detection apparatus provided in the embodiment of the present invention, the target image is input into the first model, the first score of the target image is output, the target image is input into the second model, the second score of the target image is output, and finally the first score and the second score are weighted and fused to obtain the sharpness score of the target image. The first model is obtained by training image samples and the labeling information corresponding to the image samples based on a no-reference image space quality evaluation algorithm, and the second model is obtained by training the image samples and the labeling information corresponding to the image samples based on a lightweight target classification algorithm. The first score obtained through the first model is more consistent with the consistency of the artificial subjective scores, the second score obtained through the second model can more accurately represent the definition of the target image, and the definition score of the target image is obtained by performing weighted fusion on the first score and the second score.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present invention, but the present disclosure is not necessarily detailed herein for reasons of space.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. An image detection method, characterized in that the method comprises:

acquiring a target image to be detected;

2. The method of claim 1, wherein before the acquiring the target image to be detected, the method further comprises:

acquiring an image sample;

and labeling the image sample according to the definition of the image sample to obtain labeling information corresponding to the image sample.

3. The method of claim 2, wherein the first model is trained by:

extracting the statistical characteristics of the image sample;

and inputting the statistical characteristics of the image samples and the labeling information corresponding to the image samples into a support vector machine to train to obtain the first model.

4. The method of claim 3, wherein said extracting statistical features of the image sample comprises:

extracting a mean-removed contrast normalization coefficient from the image sample;

fitting the mean-removing contrast normalization coefficient into asymmetric generalized Gaussian distribution;

and extracting the statistical characteristics of the image sample from the asymmetric generalized Gaussian distribution.

5. The method of claim 4, wherein prior to said extracting the statistical features of the image sample, the method further comprises:

determining a target area in the target image;

the extracting of the de-averaged contrast normalization coefficient from the image sample includes:

and extracting the horizontal, vertical, left diagonal and right diagonal de-mean contrast normalization coefficients of the target area.

6. The method of claim 5, wherein the target image contains text information, and wherein determining a target region in the target image comprises:

and determining the area where the text information is located in the target image as a target area.

7. The method of claim 2, wherein the lightweight object classification algorithm comprises a MobileNets neural network algorithm.

8. The method of claim 1, wherein the weighted fusion of the first score and the second score to obtain a sharpness score of the target image comprises:

9. An image detection apparatus, characterized in that the apparatus comprises:

10. The apparatus of claim 9, further comprising:

the second acquisition module is used for acquiring an image sample;