CN113537192B

CN113537192B - Image detection method, device, electronic equipment and storage medium

Info

Publication number: CN113537192B
Application number: CN202110739067.7A
Authority: CN
Inventors: 马小明
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2024-03-26
Anticipated expiration: 2041-06-30
Also published as: CN113537192A

Abstract

The disclosure provides an image detection method, an image detection device, electronic equipment and a storage medium, and relates to the technical field of data processing. The specific implementation scheme is that a target image to be detected is obtained, wherein the target image comprises characters; detecting text segments of the target image to obtain a target text segment graph of the target image; aiming at each target text segment drawing, carrying out character recognition on the target text segment drawing to obtain the confidence coefficient of each character in the character recognition result of the target text segment drawing; determining multi-dimensional confidence characteristics of the target image according to the confidence of each character in the character recognition result of each target text segmentation map; inputting the multidimensional confidence coefficient characteristics of the target image into a pre-trained first deep learning model to obtain the quality score of the target image; obtaining a detection result of whether the target image accords with the image quality standard according to the quality score of the target image; detection of whether the target image meets the image quality standard is achieved.

Description

Image detection method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of data processing, and in particular relates to an image detection method, an image detection device, electronic equipment and a storage medium.

Background

In the geographic information system, a POI (Point of Interest ) may be a house, a shop, a post, a bus stop, etc. The identification of POIs has important significance in the aspects of user positioning, electronic map generation and the like. Signboard image detection is one of the most common scenarios in the production of POI data from images.

Disclosure of Invention

The disclosure provides an image detection method, an image detection device, an electronic device, a storage medium method, an electronic device, a storage medium and an image detection device.

According to an aspect of the present disclosure, there is provided an image detection method including:

acquiring a target image to be detected, wherein the target image comprises characters;

detecting text segments of the target image to obtain a target text segment cut map of the target image;

aiming at each target text segment drawing, carrying out character recognition on the target text segment drawing to obtain the confidence coefficient of each character in the character recognition result of the target text segment drawing;

determining multi-dimensional confidence characteristics of the target image according to the confidence of each character in the character recognition result of each target text segmentation map;

Inputting the multidimensional confidence characteristics of the target image into a pre-trained first deep learning model to obtain a quality score of the target image;

and obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image.

According to another aspect of the present disclosure, there is provided an image detection apparatus including:

the target image acquisition module is used for acquiring a target image to be detected, wherein the target image comprises characters;

the text segment graph obtaining module is used for detecting text segments of the target image to obtain a target text segment graph of the target image;

the character confidence degree determining module is used for carrying out character recognition on each target text segment cut graph aiming at each target text segment cut graph to obtain the confidence degree of each character in the character recognition result of the target text segment cut graph;

the confidence coefficient feature determining module is used for determining the multidimensional confidence coefficient features of the target image according to the confidence coefficient of each character in the character recognition result of each target text segment cut map;

the quality score determining module is used for inputting the multidimensional confidence characteristics of the target image into a pre-trained first deep learning model to obtain the quality score of the target image;

And the detection result determining module is used for obtaining a detection result of whether the target image accords with the image quality standard according to the quality score of the target image.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image detection method of any one of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image detection method of any one of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the image detection method of any of the present disclosure.

In the embodiment of the disclosure, the quality score of the target image is utilized to realize the detection of whether the target image accords with the image quality standard; the multi-dimensional confidence coefficient features cover the confidence coefficient of each character in the character recognition result of each target text segment graph, fully consider the influence of each character on the image quality, and improve the accuracy of the detection result

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration of an image detection method according to the present disclosure;

FIG. 2 is a schematic illustration of a target text cut graph according to the present disclosure;

FIG. 3 is a schematic diagram of one possible implementation of step S104 according to the present disclosure;

FIG. 4 is a schematic illustration of a first deep learning model training method according to the present disclosure;

FIG. 5 is a schematic illustration of a second deep learning model training method according to the present disclosure;

FIG. 6 is a schematic diagram of an image detection device according to the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing an image detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Signboard image detection is one of the most common scenarios in the production of POI data from images. However, in the real world, the variety of trade signs is varied, a fixed form is difficult, and because of the problems of shielding, blurring and the like, a plurality of sign images which cannot produce POIs exist.

The embodiment of the disclosure provides an image detection method, referring to fig. 1, including:

s101, acquiring a target image to be detected, wherein the target image comprises characters.

The image detection method of the embodiment of the disclosure may be implemented by an electronic device, and in particular, the electronic device may be a personal computer, a smart phone, a server, or the like.

The target image to be detected may be any image including characters, for example, the target image may be a signboard, a billboard, a poster, a signboard, or the like.

S102, detecting text segments of the target image to obtain a target text segment map of the target image.

In one example, the target image may be input into a pre-trained text detection network for text segment detection, resulting in each target text segment map of the target image. The text segmentation map refers to an image area containing text segments in the whole image, and one target image at least comprises one target text segmentation map. In one example, a text segment map includes a text segment, and specifically, a text segment may be a natural segment, a title, a line of text, or the like. For example, taking a target image as a signboard image, as shown by a dashed box in fig. 2, the target image includes three target text cut-out images A, B and C. The text detection network may be selected by self-defining according to practical situations, for example, DB (Differentiable Binarization ) text detection network or DBNet (multi-category text detection network) may be used.

S103, aiming at each target text segment map, character recognition is carried out on the target text segment map, and the confidence coefficient of each character in the character recognition result of the target text segment map is obtained.

The character recognition method may adopt an OCR (Optical Character Recognition ) method in the related art, for example, the target text segment map may be recognized by a deep learning model, so as to obtain a character recognition result of the target text segment map and a confidence level of each character in the character recognition result. The deep learning model here may be CNN (convolutional neural network), RNN (recurrent neural network), CRNN (convolutional recurrent neural network), or the like.

In one example, taking a CRNN network as an example, a target text segment cut map is input into the CRNN network, and the output of the CRNN network is a logistic regression matrix and a character recognition result. The logistic regression matrix represents possible prediction results of each character in the target text segmentation graph given by the CRNN network, and the probability value corresponding to each character in the character recognition result, namely the confidence level of the character, can be obtained through normalizing the logistic regression matrix.

S104, determining the multi-dimensional confidence coefficient characteristics of the target image according to the confidence coefficient of each character in the character recognition result of each target text segment graph.

And carrying out multidimensional processing on the confidence coefficient of each character in the character recognition result of each target text segmentation graph, and combining the multidimensional confidence coefficient with the multidimensional confidence coefficient of each character to form a target image. In one example, the confidence coefficient mean value, the confidence coefficient variance, the confidence coefficient standard deviation and the like of each character in the character recognition result of one target text segment cut map can be calculated, then the confidence coefficient mean value, the confidence coefficient variance, the confidence coefficient standard deviation and the like of each target text segment cut map are combined, so that the multi-dimensional confidence coefficient characteristic of the target image is obtained, the combination mode can be set according to the actual condition in a self-defining mode, in one example, each numerical value (comprising the confidence coefficient mean value, the confidence coefficient variance, the confidence coefficient standard deviation and the like) can be respectively used as an element in a matrix, specifically, the element which position in the matrix respectively corresponds to the confidence coefficient mean value, the confidence coefficient variance and the confidence coefficient standard deviation can be preset, the multi-dimensional confidence coefficient characteristic is obtained through the mode of assigning the element, and the element without the corresponding relation can be set to zero; in one example, the values may be sequentially arranged into feature vectors, specifically, the elements defining which position in the feature vectors are respectively corresponding to the confidence mean value, the confidence variance and the confidence standard deviation may be preset, and the multidimensional confidence feature is obtained by assigning values to the elements; in one example, the values may be combined using a cont () function to obtain a multi-dimensional confidence feature, etc.

In one example, the confidence level of each character in the character recognition result, specifically, the confidence level of each valid character in the character recognition result. For each target text segment map, a CRNN model or the like may be used to output multiple sets of output results of the target text segment map, where each set of output results includes: a logistic regression matrix and a character recognition result; if the identification results of the characters at the same position in at least two groups of output results are the same, the characters are considered to be effective characters, the logistic regression matrix corresponding to the effective characters is called an effective logistic regression matrix, and the confidence coefficient of the effective characters is obtained by normalizing the effective logistic regression matrix.

S105, inputting the multidimensional confidence features of the target image into a pre-trained first deep learning model to obtain the quality score of the target image.

The first deep learning model is trained according to the multidimensional confidence characteristics of the sample image and the labeling value of the quality score of the sample image.

The first deep learning model may be any type of deep learning model, for example, CNN, RNN, CRNN, XGboost (eXtreme Gradient Boosting, extreme gradient lifting) model or CTC (Connectionist temporal classification, connection sense time classification) model, etc.

The quality score of the target image output by the first deep learning model can be a specific score value representing the image quality, a quality grade representing the image quality and the like, and an array, a vector or a matrix representing the image quality and the like; the quality score of the target image may be the quality score of the entire target image, or may be the quality scores of a plurality of items of the target image, for example, the scores of each target text segment map of the target image, or the like.

And S106, obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image.

The detection result may be binarized, i.e. the target image meets the image quality criterion or the target image does not meet the image quality criterion. In one example, the detection result corresponding to the quality score of the target image may be determined according to a predetermined correspondence between the quality score and the detection result. In one example, the quality score of the target image may be converted into a detection result of the target image by a preset algorithm or a deep learning model.

In the embodiment of the disclosure, the quality score of the target image is utilized to realize the detection of whether the target image accords with the image quality standard; the multi-dimensional confidence coefficient features cover the confidence coefficient of each character in the character recognition result of each target text segment graph, fully consider the influence of each character on the image quality, and can improve the accuracy of the detection result.

The quality score of the target image may be converted into a detection result of the target image by a deep learning model. In one possible implementation, the quality score of the target image includes a quality score of each of the target text segment maps in the target image; the step of obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image comprises the following steps: and inputting the quality scores of the target text segment graphs in the target image into a pre-trained second deep learning model to obtain a detection result of whether the target image accords with an image quality standard.

The second deep learning model is obtained through training according to the labeling value of the detection result of the sample image and the quality score of the sample image output by the first deep learning mode.

The quality score of the target image comprises the quality score of each target text segment map in the target image; the quality score of the target image may be in the form of an array, vector, or matrix representation of the quality scores of the respective target text segment cut maps. And inputting the quality scores of the target text segmentation graphs into a pre-trained second deep learning model, and outputting the detection result of the target image by the second deep learning model. The second deep learning model may be any type of deep learning model, for example, CNN, RNN, CRNN or Arcface model, etc.

In the embodiment of the disclosure, the quality score of each target text segment graph and the second deep learning model are utilized to determine the detection result of the target image, and besides the influence of characters on the image quality, the influence of the quality of each target text segment graph on the image quality is fully considered, so that the accuracy of the detection result can be improved.

The multi-dimensional confidence feature is a feature based on multiple dimensions of the character confidence. In one possible implementation manner, referring to fig. 3, the determining the multi-dimensional confidence feature of the target image according to the confidence of each character in the character recognition result of each target text segment map includes:

s301, determining the numerical value of a preset type parameter of each target text segment cut according to the confidence coefficient of each character in the character recognition result of the target text segment cut, wherein the preset type parameter comprises at least one of the mean value, the variance and the minimum value of the confidence coefficient of each character in the character recognition result.

In one example, the preset type parameter may further include a maximum value, a median, a mode, a standard deviation, or the like of confidence of each character in the character recognition result.

S302, respectively calculating the confidence coefficient parameter of each preset type parameter of each target text segment cut according to the numerical value of the preset type parameter of each target text segment cut, wherein the confidence coefficient characteristic comprises at least two of a mean value, a variance value, a maximum value and a minimum value.

In one example, the confidence features may include standard deviation, median, mode, or the like.

S303, obtaining the multidimensional confidence characteristics of the target image according to the confidence parameters of the preset type parameters of each target text segment cut map.

In one example, taking the preset type parameter including the mean, variance and minimum value of the confidence coefficient of each character in the character recognition result as an example, the confidence coefficient feature includes the mean, variance, maximum value and minimum value for illustration:

step one, aiming at each target text segment cut graph, determining the mean value, variance and minimum value of the confidence coefficient of each character according to the confidence coefficient of each character in the character recognition result of the target text segment cut graph, and obtaining the confidence coefficient mean value, confidence coefficient variance and confidence coefficient minimum value corresponding to the target text segment cut graph.

Determining the mean value, variance, maximum value and minimum value of the confidence coefficient mean value corresponding to each target text segment cut map, determining the mean value, variance, maximum value and minimum value of the confidence coefficient variance corresponding to each target text segment cut map, and determining the mean value, variance, maximum value and minimum value of the confidence coefficient minimum value corresponding to each target text segment cut map, so as to obtain the numerical value of the target text segment cut map in twelve dimensions.

The twelve dimensions are the mean value of the confidence mean value, the variance of the confidence mean value, the maximum value of the confidence mean value, the minimum value of the confidence mean value, the mean value of the confidence variance, the variance of the confidence variance, the maximum value of the confidence variance, the minimum value of the confidence variance, the mean value of the confidence minimum value, the variance of the confidence minimum value, the maximum value of the confidence minimum value and the minimum value of the confidence minimum value respectively.

And thirdly, expressing the numerical values in twelve dimensions as a matrix or vector form to obtain the multi-dimensional confidence coefficient characteristics of the target image.

In the embodiment of the disclosure, the confidence coefficient of each character in the character recognition result is utilized to perform multidimensional feature conversion, so that multidimensional confidence coefficient features of the target image are obtained, the relation among the features in different dimensions is fully considered, the representativeness of the multidimensional confidence coefficient features is more comprehensive, and the accuracy of the detection result can be improved.

The training process of the first deep learning model is described below, and in one possible implementation, the method further includes:

step A, obtaining images of all samples;

step B, respectively carrying out text segment detection on each sample image to obtain a sample text segment cut map of each sample image;

Step C, aiming at each sample Wen Benduan cutting graph, carrying out character recognition on the sample text cutting graph to obtain the confidence coefficient of each character in the character recognition result of the sample text cutting graph and the confidence coefficient of the character recognition result;

step D, aiming at each sample image, determining the multidimensional confidence characteristics of the sample image according to the confidence of each character in the character recognition result of each sample text segmentation chart of the sample image;

step E, selecting a sample image, and inputting the multidimensional confidence characteristics of the currently selected sample image into a first deep learning model to obtain a prediction quality score of the currently selected sample image;

step F, calculating the loss of the first deep learning model according to the confidence coefficient of the character recognition result of each text segmentation graph of the currently selected sample image and the prediction quality score of the currently selected sample image;

and G, adjusting training parameters of the first deep learning model according to the loss of the first deep learning model, and returning to the execution step: and selecting a sample image, inputting the multidimensional confidence characteristics of the currently selected sample image into the first deep learning model to obtain the prediction quality score of the currently selected sample image until a preset first training ending condition is met, and obtaining a trained first deep learning model.

The training process of the second deep learning model is described below, and in one possible embodiment, the method further includes:

step 1, selecting a sample image;

step 2, inputting a currently selected sample image into a first one-deep learning model to obtain a predicted quality score of the currently selected sample image, wherein the quality score of the currently selected sample image comprises quality scores of all the text section graphs in the currently selected sample image;

step 3, inputting the quality scores of the text segmentation graphs of the various samples in the currently selected sample image into a second deep learning model to obtain a prediction result of whether the currently selected target image accords with the image quality standard;

step 4, calculating the loss of the second deep learning model according to the prediction result output by the second deep learning model;

and 5, adjusting training parameters of the second deep learning model according to the loss of the second deep learning model, and returning to the execution step: and selecting a sample image until a preset second training ending condition is met, and obtaining a trained second deep learning model.

The embodiment of the disclosure also provides a deep learning model training method, referring to fig. 4, including:

S401, acquiring each sample image.

The deep learning model training method of the embodiment of the disclosure may be implemented by an electronic device, and in particular, the electronic device may be a personal computer, a smart phone, a server, or the like.

The sample image may be any image including characters, for example, the sample image may be a signboard, a billboard, a poster or a signboard, or the like.

S402, respectively performing text segment detection on each sample image to obtain a sample text segment graph of each sample image.

The specific process of "performing text segment detection on a sample image to obtain a sample text segment map of the sample image" may refer to the process of "performing text segment detection on a target image to obtain a target text segment map of the target image" in the above embodiment, which is not described herein.

S403, aiming at each text segment and graph, carrying out character recognition on the text segment and graph to obtain the confidence coefficient of each character in the character recognition result of the text segment and graph and the confidence coefficient of the character recognition result.

The specific determination process of the "confidence level of each character in the character recognition result of the sample text segment map" can be referred to as the determination process of the "confidence level of each character in the character recognition result of the target text segment map" in the above-described embodiment.

The confidence of the character recognition result of the sample text segmentation graph can be the mean value, the median value or the minimum value of the confidence of each character in the character recognition result. In one possible implementation manner, the performing character recognition on the sample text segment map for each sample text segment map to obtain the confidence level of the character recognition result of the sample text segment map includes: determining the total number of characters, the number of effective characters and the confidence of each effective character of the sample text segmentation map aiming at each sample text segmentation map; under the situation that the number of effective characters in the sample text section chart is equal to the total number of characters in the sample text section chart, selecting the smallest confidence degree from the confidence degrees of all the effective characters in the sample text section chart as the confidence degree of the character recognition result of the sample text section chart; and in the situation that the number of effective characters in the sample text segmentation map is smaller than the total number of characters in the sample text segmentation map, the confidence coefficient of the character recognition result of the sample text segmentation map is set to be 0.

The valid characters refer to characters successfully recognized in the character recognition result, for example, the characters with the confidence coefficient of the recognition result being greater than a preset confidence coefficient threshold value can be used. In one example, for each sample text segment map, multiple sets of output results of the sample text segment map may be output using a CRNN model or the like, each set of output results including: a logistic regression matrix and a character recognition result; if the recognition results of the characters at the same position in at least two groups of output results are the same in the multiple groups of output results, the characters are considered to be valid characters, for example, one text segmentation map corresponds to three groups of output results, and the character recognition results of the three groups of output results are respectively: the statistics, the statistics and the statistics are carried out, and for the first character position, three groups of output results are the same, and the statistics is the effective character on the first character position; aiming at the second character position, the three groups of output results are different, and no effective character exists at the second character position; for the third character position, two groups of output results are the same as 'school', and 'school' is the effective character at the third character position. And (3) the logistic regression matrix corresponding to the effective characters is called an effective logistic regression matrix, and the confidence coefficient of the effective characters is obtained by normalizing the effective logistic regression matrix. When the sample text segment map comprises non-valid characters, the situation that the characters with inaccurate identification exist in the sample text segment map is indicated, for example, the situation that the area corresponding to the sample text segment map in the sample image is blocked or blurred exists, and the confidence coefficient of the character identification result of the sample text segment map is 0 in the situation, so that the character identification result of the sample text segment map is unreliable, the confidence coefficient of the determined character identification result is more accurate, and the accuracy of image detection can be improved; when all the sample text segmentation graphs are effective characters, the minimum confidence coefficient is selected from the confidence coefficient of each effective character as the confidence coefficient of the character recognition result, and the determined confidence coefficient of the character recognition result is more accurate, so that the accuracy of image detection can be improved.

S404, for each sample image, determining the multidimensional confidence characteristics of the sample image according to the confidence of each character in the character recognition result of each text segmentation graph of the sample image.

For a specific determination process of the "multi-dimensional confidence feature of the sample image", reference may be made to the determination process of the "multi-dimensional confidence feature of the target image" in the above embodiment, and details thereof are not repeated here.

S405, selecting a sample image, and inputting the multidimensional confidence characteristics of the currently selected sample image into the first deep learning model to obtain the prediction quality score of the currently selected sample image.

The first deep learning model may be any type of deep learning model, for example, may be a CNN, RNN, CRNN, XGboost model or a CTC model, etc.

S406, calculating the loss of the first deep learning model according to the confidence degree of the character recognition result of each text segmentation graph of the currently selected sample image and the prediction quality score of the currently selected sample image.

In one example, a mean value of confidence of a character recognition result of each sample text segment cut graph may be calculated, and a loss of the first deep learning model may be calculated according to a difference between the mean value and a predicted quality score; in one example, a minimum value may be selected from the confidence levels of the character recognition results of the cut graphs of each sample Wen Benduan, and the loss of the first deep learning model may be calculated based on the difference between the minimum value and the predicted quality score. In one example, the prediction quality score of the sample image may include a prediction quality score of each of the text segment graphs of the sample image, the loss of the first deep learning model may be calculated based on a distance between the prediction quality score of each of the text segment graphs and a confidence level of the character recognition result, and the like.

S407, adjusting training parameters of the first deep learning model according to the loss of the first deep learning model, and returning to the execution step: and selecting a sample image, inputting the multidimensional confidence characteristics of the currently selected sample image into the first deep learning model to obtain the prediction quality score of the currently selected sample image until a preset first training ending condition is met, and obtaining a trained first deep learning model.

The first training end condition may be set in a user-defined manner according to an actual situation, for example, the first training end condition may be convergence of a loss of the first deep learning model, or the first training end condition may be a preset number of training times for the first deep learning model. In general, the sample data may be divided into a training set and a test set, and in one example, each sample image may be divided into a training set and a test set by using a 9:1 method, where the training set is used for training the first deep learning model, and the test set is used for testing the first deep learning model, so as to reduce the situation of overfitting.

In the embodiment of the disclosure, the training of the first deep learning model is realized, the confidence coefficient of each character in each text segmentation graph in the same sample image is covered in the multi-dimensional confidence coefficient characteristics used for training, the influence of each character on the image quality is fully considered, and the accuracy of the detection result can be improved.

The embodiment of the disclosure also provides a deep learning model training method, referring to fig. 5, including:

s501, selecting a sample image.

S502, inputting the currently selected sample image into a first deep learning model to obtain a predicted quality score of the currently selected sample image, wherein the quality score of the currently selected sample image comprises quality scores of all the text section graphs in the currently selected sample image.

S503, inputting the quality scores of the text segmentation graphs of the various samples in the currently selected sample image into a second deep learning model to obtain a prediction result of whether the currently selected sample image accords with the image quality standard.

The second deep learning model may be any type of deep learning model, for example, CNN, RNN, CRNN or Arcface models, etc. The detection result may be binarized, i.e. the sample image meets the image quality criterion or the target image does not meet the image quality criterion.

S504, calculating the loss of the second deep learning model according to the prediction result output by the second deep learning model.

Specifically, the loss of the second deep learning model can be calculated according to the prediction result output by the second deep learning model and the labeling value of the image quality standard of the currently selected sample image.

S505, adjusting training parameters of the second deep learning model according to the loss of the second deep learning model, and returning to the execution step: and selecting a sample image until a preset second training ending condition is met, and obtaining a trained second deep learning model.

The second training end condition may be set in a user-defined manner according to an actual situation, for example, the second training end condition may be convergence of a loss of the second deep learning model, or the second training end condition may be a preset number of training times for the second deep learning model. In general, the sample data may be divided into a training set and a test set, and in one example, each sample image may be divided into a training set and a test set by using a 9:1 method, where the training set is used to train the second deep learning model, and the test set is used to test the second deep learning model, so as to reduce the situation of overfitting.

In the embodiment of the disclosure, training of the second deep learning model is realized, the quality score of each target text segment graph and the second deep learning model are utilized to determine the detection result of the target image, the quality of each target text segment graph is fully considered, and the accuracy of the detection result can be improved.

Still another embodiment of the present disclosure provides an image detection apparatus 600, referring to fig. 6, including:

a target image acquisition module 61, configured to acquire a target image to be detected, where the target image includes characters;

a text segment map obtaining module 62, configured to perform text segment detection on the target image, so as to obtain a target text segment map of the target image;

the character confidence determining module 63 is configured to perform character recognition on each target text segment cut map, so as to obtain the confidence of each character in the character recognition result of the target text segment cut map;

a confidence coefficient feature determining module 64, configured to determine a multidimensional confidence coefficient feature of the target image according to the confidence coefficient of each character in the character recognition result of each target text segment map;

a quality score determining module 65, configured to input the multidimensional confidence feature of the target image into a first deep learning model trained in advance, to obtain a quality score of the target image;

The detection result determining module 66 is configured to obtain a detection result of whether the target image meets an image quality standard according to the quality score of the target image.

In a possible implementation manner, the confidence characteristic determining module is specifically configured to:

for each target text segment graph, determining the numerical value of a preset type parameter of the target text segment graph according to the confidence coefficient of each character in the character recognition result of the target text segment graph, wherein the preset type parameter comprises at least one of the mean value, the variance and the minimum value of the confidence coefficient of each character in the character recognition result;

respectively calculating the confidence coefficient parameter of each preset type parameter of each target text segment cut according to the numerical value of the preset type parameter of each target text segment cut, wherein the confidence coefficient characteristic comprises at least two of a mean value, a variance, a maximum value and a minimum value;

and obtaining the multi-dimensional confidence coefficient characteristics of the target image according to each confidence coefficient parameter of each preset type parameter of each target text segmentation graph.

In one possible implementation, the quality score of the target image includes a quality score of each of the target text segment maps in the target image;

The detection result determining module is specifically configured to: and inputting the quality scores of the target text segment graphs in the target image into a pre-trained second deep learning model to obtain a detection result of whether the target image accords with an image quality standard.

In one possible embodiment, the apparatus further comprises:

the target image acquisition module is used for acquiring each sample image;

the text segment detection module is used for respectively carrying out text segment detection on each sample image to obtain a sample text segment cut map of each sample image;

the character recognition module is used for carrying out character recognition on the sample text segment cut map aiming at each sample text segment cut map to obtain the confidence coefficient of each character in the character recognition result of the sample text segment cut map and the confidence coefficient of the character recognition result;

the confidence coefficient feature conversion module is used for determining the multidimensional confidence coefficient features of each sample image according to the confidence coefficient of each character in the character recognition result of each text segmentation graph of the sample image;

the quality score prediction module is used for selecting a sample image, inputting the multi-dimensional confidence coefficient characteristics of the currently selected sample image into the first deep learning model, and obtaining the predicted quality score of the currently selected sample image;

The model loss calculation module is used for calculating the loss of the first deep learning model according to the confidence coefficient of the character recognition result of each text segmentation chart of the currently selected sample image and the prediction quality score of the currently selected sample image;

the training ending judging module is used for adjusting training parameters of the first deep learning model according to the loss of the first deep learning model, and returning to the executing step: and selecting a sample image, inputting the multidimensional confidence characteristics of the currently selected sample image into the first deep learning model to obtain the prediction quality score of the currently selected sample image until a preset first training ending condition is met, and obtaining a trained first deep learning model.

In a possible implementation manner, the character recognition module is specifically configured to:

determining the total number of characters, the number of effective characters and the confidence of each effective character of the sample text segmentation map aiming at each sample text segmentation map;

under the situation that the number of effective characters in the sample text section chart is equal to the total number of characters in the sample text section chart, selecting the smallest confidence degree from the confidence degrees of all the effective characters in the sample text section chart as the confidence degree of the character recognition result of the sample text section chart;

And in the situation that the number of effective characters in the sample text segmentation map is smaller than the total number of characters in the sample text segmentation map, the confidence coefficient of the character recognition result of the sample text segmentation map is set to be 0.

In one possible embodiment, the apparatus further comprises: the deep learning model training module is used for:

selecting a sample image;

inputting the currently selected sample image into a first deep learning model to obtain a predicted quality score of the currently selected sample image, wherein the quality score of the currently selected sample image comprises the quality scores of all the text section graphs in the currently selected sample image;

inputting the quality scores of the text segment graphs in the currently selected sample image into a second deep learning model to obtain a prediction result of whether the currently selected sample image accords with the image quality standard;

calculating the loss of the second deep learning model according to the prediction result output by the second deep learning model;

and adjusting training parameters of the second deep learning model according to the loss of the second deep learning model, and returning to the execution step: and selecting a sample image until a preset second training ending condition is met, and obtaining a trained second deep learning model.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image detection method of any one of the present disclosure.

A computer program product comprising a computer program which, when executed by a processor, implements the image detection method of any of the present disclosure

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 71 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 72 or a computer program loaded from a storage unit 78 into a Random Access Memory (RAM) 73. In the RAM73, various programs and data required for the operation of the device 700 may also be stored. The computing unit 71, the ROM 72, and the RAM73 are connected to each other by a bus 74. An input/output (I/O) interface 75 is also connected to bus 74.

Various components in device 700 are connected to I/O interface 75, including: an input unit 76 such as a keyboard, a mouse, etc.; an output unit 77 such as various types of displays, speakers, and the like; a storage unit 78 such as a magnetic disk, an optical disk, or the like; and a communication unit 79 such as a network card, modem, wireless communication transceiver, etc. Communication unit 79 allows device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 71 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 71 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 71 performs the respective methods and processes described above, such as an image detection method. For example, in some embodiments, the image detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 78. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 700 via the ROM 72 and/or the communication unit 79. When the computer program is loaded into the RAM73 and executed by the computing unit 71, one or more steps of the image detection method described above may be performed. Alternatively, in other embodiments, the computing unit 71 may be configured to perform the image detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image detection method, comprising:

acquiring a target image to be detected, wherein the target image comprises characters, and the target image is a signboard, a billboard, a poster or a signboard;

detecting text segments of the target image to obtain a target text segment cut map of the target image; the target text segment graph represents an image area containing text segments in the target image;

obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image; the detection result indicates that the target image accords with an image quality standard or the target image does not accord with the image quality standard, and the target image accords with the image quality standard to indicate that the target image accords with a standard for extracting POI interest point data.

2. The method of claim 1, wherein the determining the multi-dimensional confidence feature of the target image according to the confidence of each character in the character recognition result of each target text segment cut map comprises:

3. The method of claim 1, wherein the quality score of the target image comprises a quality score of each of the target text cut maps in the target image;

the step of obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image comprises the following steps:

and inputting the quality scores of the target text segment graphs in the target image into a pre-trained second deep learning model to obtain a detection result of whether the target image accords with an image quality standard.

4. A method according to any one of claims 1-3, wherein the training process of the first deep learning model comprises:

acquiring images of all samples;

respectively carrying out text segment detection on each sample image to obtain a sample text segment cut map of each sample image;

Aiming at each sample text segmentation drawing, carrying out character recognition on the sample text segmentation drawing to obtain the confidence coefficient of each character in the character recognition result of the sample text segmentation drawing and the confidence coefficient of the character recognition result;

for each sample image, determining the multidimensional confidence coefficient characteristics of the sample image according to the confidence coefficient of each character in the character recognition result of each sample text segmentation graph of the sample image;

selecting a sample image, and inputting the multidimensional confidence characteristics of the currently selected sample image into a first deep learning model to obtain a prediction quality score of the currently selected sample image;

calculating the loss of the first deep learning model according to the confidence coefficient of the character recognition result of each text segment graph of the currently selected sample image and the prediction quality score of the currently selected sample image;

and adjusting training parameters of the first deep learning model according to the loss of the first deep learning model, and returning to the execution step: and selecting a sample image, inputting the multidimensional confidence characteristics of the currently selected sample image into the first deep learning model to obtain the prediction quality score of the currently selected sample image until a preset first training ending condition is met, and obtaining a trained first deep learning model.

5. The method of claim 4, wherein the performing character recognition on the sample text segment map for each sample text segment map to obtain the confidence of the character recognition result of the sample text segment map comprises:

6. The method of claim 3, wherein the training process of the second deep learning model comprises:

selecting a sample image;

7. An image detection apparatus comprising:

the target image acquisition module is used for acquiring a target image to be detected, wherein the target image comprises characters and is a signboard, a billboard, a poster or a signboard;

the text segment graph obtaining module is used for detecting text segments of the target image to obtain a target text segment graph of the target image; the target text segment graph represents an image area containing text segments in the target image;

the detection result determining module is used for obtaining a detection result of whether the target image accords with an image quality standard according to the quality score of the target image; the detection result indicates that the target image accords with an image quality standard or the target image does not accord with the image quality standard, and the target image accords with the image quality standard to indicate that the target image accords with a standard for extracting POI interest point data.

8. The apparatus of claim 7, wherein the confidence feature determination module is specifically configured to:

9. The apparatus of claim 7, wherein the quality score of the target image comprises a quality score of each of the target text cut maps in the target image;

10. The apparatus according to any one of claims 7-9, wherein the apparatus further comprises:

the target image acquisition module is used for acquiring each sample image;

11. The apparatus of claim 10, wherein the character recognition module is specifically configured to:

12. The apparatus of claim 9, wherein the apparatus further comprises: the deep learning model training module is used for:

selecting a sample image;

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.