CN113221918B

CN113221918B - Target detection method, training method and device of target detection model

Info

Publication number: CN113221918B
Application number: CN202110540636.5A
Authority: CN
Inventors: 梁晓旭; 张言; 刘星; 邓远达; 胡旭
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2023-08-04
Anticipated expiration: 2041-05-18
Also published as: CN113221918A

Abstract

The disclosure provides a target detection method, a training method of a target detection model and a training device of the target detection model, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected; acquiring position frame information in the target detection result under the condition that the target detection result meets the preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value; and verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected. In the technical scheme, in target detection results output by a target detection model, the target detection results which contain position frame information and have confidence coefficient smaller than a preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved.

Description

Target detection method, training method and device of target detection model

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of machine learning.

Background

With the rise of electronic commerce, commercial advertising means are increasingly diverse, and rich picture advertising is an important means of advertising. In the process of promoting the actual advertising materials, the condition that merchants release images containing specific targets in a violation manner exists in the process of picture-text propaganda, the images provided in the advertising materials of the merchants are required to be checked, and the accuracy of a content checking scheme in the prior art is low, so that the actual requirements cannot be met.

Disclosure of Invention

The disclosure provides a target detection method, a training device, training equipment and a storage medium of a target detection model.

According to an aspect of the present disclosure, there is provided a target detection method including:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

acquiring position frame information in the target detection result under the condition that the target detection result meets the preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value;

And verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected.

According to another aspect of the present disclosure, there is provided a training method of a target detection model, including:

acquiring a plurality of groups of first training sample data, and training an initial target detection model based on the plurality of groups of first training sample data until a preset training ending condition is met, so as to obtain a target detection model of any embodiment of the disclosure;

the first training sample data comprises a first sample image and a first sample label corresponding to the first sample image, wherein the first sample label is used for representing whether the first sample image contains a target object and position frame information of the target object, and the first sample image comprises a spliced image; the spliced image is obtained by splicing a target object image and a similar target image, wherein the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

According to another aspect of the present disclosure, there is provided an object detection apparatus including:

the detection image acquisition module is used for acquiring an image to be detected;

The target detection module is used for inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

the position frame acquisition module is used for acquiring position frame information in the target detection result under the condition that the target detection result meets the preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value;

and the result checking module is used for checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected.

According to another aspect of the present disclosure, there is provided a training apparatus of an object detection model, including:

the sample acquisition module is used for acquiring a plurality of groups of first training sample data;

the model training module is used for training the initial target detection model based on a plurality of groups of first training sample data until a preset training ending condition is met, so as to obtain a target detection model of any embodiment of the disclosure;

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

The technical scheme solves the problem that the auditing accuracy of the content auditing scheme in the prior art is low. According to the target detection method in the technical scheme, in target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a target detection method according to an embodiment of the disclosure;

FIG. 2 is a schematic diagram of a process for obtaining training samples of a target detection model according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of an iterative process of a target detection model in an embodiment of the disclosure;

FIG. 4 is a schematic diagram of an image review system in accordance with one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training method of a target detection model according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram of an object detection device according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram of a sample update module according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of a training apparatus for a target detection model in an embodiment of the disclosure;

fig. 9 is a block diagram of an electronic device for implementing the object detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, when detecting a target in an image, the following two methods are generally used:

one way is to perform target region extraction and target search by means of a sliding window. For multi-category target recognition, features are extracted by adopting feature extraction modes based on a directional gradient histogram (Histogram of Oriented Gradient, HOG), scale-invariant feature transform (Scale-invariant feature transform, SIFT) and the like, and a traditional machine learning classifier based on an adaptive enhancement (Adaptive boosting, adaboost) algorithm, a support vector machine (Support Vector Machine, SVM) and the like is trained to determine whether the image contains the target. The method has the advantages of poor target detection robustness, low recognition precision and low speed, and is difficult to meet the requirements of content auditing time delay and accuracy.

The other mode is to detect the target based on a deep learning target detection scheme, collect a large amount of image data containing the target, manually mark detection position information of various targets, identify the region of the target through a trained model, record the position of a regression frame and determine a detection result through a category threshold. The target detection mode relies on a large amount of labeling data to train a target detection model, the labeling data is difficult to collect, and the trained model has low accuracy and recall rate in a scene with extremely sparse risk data through the collected data with single background, so that the requirements of high recall and high accuracy content audit can not be met. For some classes of target objects with few daily opportunities for occurrence, the collectable data is completely unable to reach the trainable magnitude of the target detection model. In addition, for a target object requiring exemption verification in a specific scene, the actual requirement cannot be met.

In the technical scheme, in target detection results output by a target detection model, the target detection results which contain position frame information and have confidence coefficient smaller than a preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved. The training sample data of the model can be enriched by utilizing the spliced image obtained by splicing the target object image and the similar target object image as a training sample. And the auditing and exempting mechanism is used for meeting the auditing and exempting requirements of the image to be detected in a specific scene.

The execution subject of the present disclosure may be any electronic device, e.g., a server, etc. The target detection method in the embodiments of the present disclosure will be described in detail below.

Fig. 1 is a schematic diagram of a target detection method according to an embodiment of the disclosure. As shown in fig. 1, the target detection method may include:

step S101, obtaining an image to be detected;

step S102, inputting an image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

step S103, acquiring position frame information in the target detection result under the condition that the target detection result meets the preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value.

And step S104, verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected.

In the target detection method of the embodiment of the disclosure, in target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved.

In one embodiment, for example, in an application scenario of image review, the image to be detected may be an image that needs content review. Any target object that cannot pass the audit may be included in the image, and the target object may include a badge image. In addition, the image to be detected may be an image in another application scene where target detection is required, which is not limited in this application.

In one embodiment, the object detection model may be any neural network model including, but not limited to:

fast regional convolutional neural network (Faster region-convolution neural network, fast RCNN), single-step multi-frame detector (Single Shot MultiBox Detector, SSD), yolo (You Only Look Once), etc.

In one embodiment, the target detection result output by the target detection model may include information indicating whether the target object is included, for example, the target object is not included by a number of 0, and the target object is included by a number of 1. It may also be indicated by other means whether the image to be detected contains the target object, which is not limited in this application. And if the target detection result contains the target object, outputting the position frame information of the target object in the image to be detected and the confidence corresponding to the target detection result. The position frame information includes, but is not limited to, coordinate information of the position frame, and the like.

In one embodiment, a confidence threshold is pre-configured according to specific needs, if the confidence coefficient corresponding to the target detection result is smaller than the confidence coefficient threshold, an image corresponding to the position frame is determined in the image to be detected according to the position frame information, verification is performed, and whether the image to be detected contains the target object is further determined, so that the accuracy rate and recall rate of target detection are improved.

In one embodiment, the target detection model is obtained by training an initial target detection model by adopting a plurality of groups of first training sample data, wherein the first training sample data comprises a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used for representing whether the first sample image contains a target object and position frame information of the target object; the first sample image comprises a stitched image; the spliced image is obtained by splicing a target object image and a similar target image, wherein the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

Wherein the target object may be a badge class object, then the similar target object may be a target that is similar in shape or color to a badge, e.g., a watch, a circular icon, etc. Alternatively, the target object image and the similar target object image may be acquired in a preset database or a commonly used search engine.

In some embodiments, for the target detection model for identifying badge targets, when training data is acquired, the difficulty of acquiring a large amount of relevant training data is greater for unusual badge images, and more badge images of the type can be acquired as training data by means of image stitching.

It will be appreciated that the training sample data of the object detection model may comprise a large number of types of images, such as object images, similar object images, in addition to stitched images.

In the embodiment of the disclosure, the spliced images obtained by splicing the target object images and the similar target object images are used as training samples, so that the number and richness of the training samples can be increased, the model is trained through the rich training sample images, more information can be learned by the model, and the detection accuracy of the target detection model after training is improved.

In one embodiment, the stitched image is obtained by:

processing the target object image according to a preset mode to obtain a processed image; the preset mode comprises at least one of sharpening, noise adding, filtering, color dithering, random filling and perspective transformation;

and taking the processed image as a foreground image, taking a similar target image as a background image, and performing splicing processing.

The preset mode may be any other image processing mode, which is not limited in this application.

In the embodiment of the disclosure, the target object is processed in different modes to obtain the images processed in each processing mode, then the images are spliced, and the spliced images are used as training samples, so that the number and the richness of the training samples can be increased.

In one embodiment, the target detection method further comprises:

in the training process of the initial target detection model, a plurality of test sample images are obtained, the test sample images are input into the model, and output results corresponding to the test sample images are obtained;

splicing the images corresponding to the error detection results in the output results corresponding to the target object images and the test sample images to obtain new spliced images, wherein the error detection results are the results of detecting the images which do not contain the target object as the results containing the target object;

Acquiring a labeling label of a new spliced image;

and updating the first training sample data by using the new spliced image and the labeling label.

In the training process of the initial target detection model, a test sample image can be obtained from a preset image database, and the test sample image is input into the model for testing. And splicing the image corresponding to the error detection result output by the model with the target object image to obtain a new spliced image, and adding the new spliced image into a training sample database of the model. The model can be a model after parameter adjustment of the initial target detection model through continuous iteration in the training process, and the model at the moment does not meet the training ending condition yet, but is not a target detection model after final training.

In some embodiments, the labeling label of the new stitched image may be a manually labeled label, or may be a label obtained in other manners, which is not limited in this application.

In the embodiment of the disclosure, training sample data of a model is increased by splicing the image corresponding to the detection error result corresponding to the test sample image in the model iteration process and the new spliced image obtained by splicing the target object image, so that the amplification of the training sample data can be realized.

In one embodiment, verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected includes:

acquiring an image corresponding to the position frame from the image to be detected based on the position frame information;

and inputting the image corresponding to the position frame into a pre-trained first classification model to obtain a classification result, wherein the target verification result corresponding to the image to be detected comprises the classification result.

The first classification model may include, but is not limited to, a deep convolution classification model, a ResNet model, an acceptance model, a VGG model, and the like. Alternatively, the classification result may be a probability of including the target object or not including the target object, or may include the target object or not including the target object, which is not limited in this application. Alternatively, the classification result may be used as a final target verification result to determine whether the image to be detected is an audit passing image.

The image input to the first classification model may be an image framed by a position in a detection result corresponding to the image to be detected.

In the embodiment of the disclosure, the aim of verification is achieved by classifying the position frame images corresponding to the target detection result, so that the accuracy rate and recall rate of target detection are improved.

In one embodiment, the first classification model is obtained by training the initial first classification model by using a plurality of groups of second training samples, wherein the second training samples comprise second sample images and second sample labels corresponding to the second sample images; the second sample image comprises an image corresponding to the position frame information of the target object in the first sample image and a position frame image corresponding to the result of the detection error output by the initial target detection model in the training process; the second sample tag is used for representing whether the second sample image contains the target object.

In the embodiment of the disclosure, the image corresponding to the position frame information of the target object in the first sample image is used as a positive sample of the first classification model, and the position frame image corresponding to the result of the detection error output by the initial target detection model in the training process is used as a negative sample to train the first classification model, so that the classification accuracy of the trained first classification model is higher.

Acquiring a feature vector of an image corresponding to the position frame;

searching in a preset search database based on the feature vector to obtain a search result; retrieving a feature vector comprising the target object image in a database;

and determining a target verification result corresponding to the image to be detected according to the retrieval result.

In this embodiment, the feature extraction module of the first classification model may extract a feature vector of an image corresponding to the position frame, and use the feature vector to perform a search in the search database, where the search manner may include, but is not limited to, approximate nearest neighbor search (Approximate Nearest Neighbor, ANN). The search database is a pre-established database containing feature vectors of various types of target object images. And if the similarity between the retrieved position frame image and the image in the retrieval database is greater than a preset threshold value, the image to be processed is considered to contain the target object.

In the embodiment of the disclosure, whether the image to be detected contains the target object is checked in the searching mode in the searching database, so that the accuracy of target detection can be improved. The data in the search database in the embodiment of the disclosure can be continuously updated, and the feature vector of the newly added target object image is continuously and newly added into the search database, so that the recall capability of the newly added target object image is improved.

acquiring characteristic information of an image corresponding to the position frame, and determining a target verification result corresponding to the image to be detected based on the characteristic information;

the characteristic information includes at least one of:

the shape characteristics of the objects contained in the image, the color characteristics of the objects contained in the image, and the text information in the image.

In the embodiment of the present disclosure, the visual features of the position frame image may also be extracted through a computer visual algorithm, including but not limited to at least one of shape features of objects included in the image, color features of objects included in the image, and text information in the image, and by using these features, whether the position frame image includes the target object may be determined, so that the accuracy of the target detection result may be improved.

In the technical scheme of the disclosure, an audit exemption mechanism may be further included, and for an image of a target object included in the target verification result, whether audit exemption is performed is further determined, which is specifically described in the following embodiment.

In one embodiment, the target object comprises a badge class object, and the target detection method further comprises:

and when the target verification result is that the image to be detected contains badge-type objects, inputting the image to be detected into a preset second classification model, and when the image to be detected is determined to be the book cover image according to the output result of the second classification model, determining the image to be detected as a verification passing image.

In the embodiment of the disclosure, in the case that the target object is a badge class object, the second classification model may be a model for classifying the book cover image and the certificate image. Inputting an image to be detected or a position frame image of the image to be detected into a second classification model, determining whether the image is a book cover image, wherein the book cover image can be an image without checking, and if the image contains badge type images, the image can pass checking, and determining that the image to be detected passes checking.

In the embodiment of the disclosure, whether the image to be detected can be checked or not is determined through the output result of the second classification model, so that the checking and exempting requirement on the book cover image can be met.

In one embodiment, the target detection method further comprises:

And under the condition that the image to be detected is not the book cover image according to the output result of the second classification model, acquiring text information in the image to be detected, and under the condition that the image to be detected is a legal document image according to the text information, determining the image to be detected as an audit passing image.

In the embodiment of the disclosure, under the condition that the image to be detected is not the book cover image according to the output result of the second classification model, the text information in the image to be detected is identified by means of optical character recognition (Optical Character Recognition, OCR) and the like according to the text information in the image to be detected, and whether the image to be detected is a legal document name is determined, so that whether the image to be detected is the legal document image is determined, and if the image to be detected is the legal document image, the image to be detected is determined to be the verification passing image.

In the embodiment of the disclosure, whether the image to be detected can be checked or not is determined in a text recognition mode, so that the checking and exempting requirement of the image of the legal certificate can be met.

In one embodiment, the target detection method further comprises:

under the condition that the image to be detected is not the legal document image according to the text information, obtaining user qualification information corresponding to the image to be detected, determining whether a user corresponding to the image to be detected has checking and exempting permission according to the user qualification, and if so, determining the image to be detected as a checking and passing image.

In the embodiment of the disclosure, the auditing exemption permission can be given to a specific user, and auditing exemption can be carried out on the image uploaded by the user. The user qualification information can be bound with the image uploaded by the user, the user qualification information of the user corresponding to the image can be queried through the identification information and the like of the image, and whether the user corresponding to the image to be detected has the checking exemption authority can be determined according to the user qualification information.

In the embodiment of the disclosure, whether the image to be detected can be checked and exempted or not is determined by checking the user qualification information, so that the checking and exempting requirement of the image uploaded by the user with checking and exempting authority can be met.

In one embodiment, the target detection method further comprises:

and determining the image to be detected as an audit failed image when the user corresponding to the image to be detected does not have audit exemption authority according to the user qualification.

In the embodiment of the disclosure, the image which is sent by the user without the audit exemption authority and contains the badge class target object is determined to be the audit failed image.

According to the target detection method in the technical scheme, in target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved.

Fig. 2 is a schematic diagram of a process for obtaining training samples of a target detection model according to an embodiment of the disclosure. The target object in this embodiment is a badge class object. As shown in fig. 2, an image containing a badge is acquired as a positive sample image, and a sample annotation (a "positive sample grab, annotation" as shown in the figure) is performed. Acquiring an image similar to the outline of the badge as a similar target image (the "badge outline similar image capture" shown in the figure); the image similar to the badge image (such as 'advertisement material library search' shown in the figure) is searched in a preset database, the badge image and the image similar to the badge image are utilized to splice to obtain a spliced image (such as 'sample generation and amplification' shown in the figure), and the badge image, the image similar to the badge and the spliced image are used as first training sample data (such as 'initial training data' shown in the figure) of a target detection model.

FIG. 3 is a schematic diagram of an iterative process of the object detection model in an embodiment of the present disclosure. As shown in fig. 3, a target object image and a similar target object image are acquired, stitching is performed, a sample label (initial annotation and generation data as shown in the figure) of the stitched image is acquired, the stitched image and the sample label are used as a training data set of a target detection model, iterative training of the target detection model is performed, in the iterative process of the model, a test sample image is input into the model, a position frame image corresponding to a detection error result output by the model is used as a background image (negative example collection as shown in the figure) and the target object image are stitched, a new stitched image is obtained, the new stitched image is added into the training sample set of the target detection model (sample generation and amplification as shown in the figure), and a training data set training model (model iteration as shown in the figure) which is updated continuously is used until a preset training end condition is met, so that a target detection model (model release as shown in the figure) is obtained.

Fig. 4 is a schematic diagram of an image review system according to an embodiment of the disclosure. As shown in fig. 4, in the present embodiment, the target object image is a badge class image. The image auditing system comprises: the system comprises a target detection unit, a secondary verification unit and a special scene identification and exemption unit. The image to be detected is input into an object detection unit, badge detection is carried out through a sensitive badge object detector (which can be an object detection model in the technical scheme of the disclosure), a badge detection result is obtained, if the probability of badge contained therein is smaller than a preset threshold value (as shown in the figure, the badge is not contained in the image to be detected, and the image to be processed is determined to be an audit passing image. If the probability of badge contained in the badge detection result is greater than or equal to a preset threshold, the badge possibly contained in the image to be detected is indicated, the target detection result contains position frame information, the image to be processed, of which the confidence coefficient corresponding to the target detection result is smaller than the preset confidence coefficient threshold, is checked, the position frame image is extracted from the images, and is input into a secondary checking unit for checking, and the specific checking mode can comprise three modes: the first search library search comprises the following specific implementation processes: extracting the feature vector of the position frame image, performing compression processing, searching in a preset searching database by using the processed feature vector to obtain a searching result, and determining whether badge is included according to the searching result. The second morphological feature detection, the specific implementation process includes: and extracting feature information of the position frame image, and determining whether the badge is contained in the image according to the shape features of the objects contained in the image, the color features of the objects contained in the image, the character information and other features in the image. The third type determines whether or not a badge is included in an image by a classification result of a badge recognition classifier (first classification model in the technical scheme of the present disclosure). If the probability of badge contained in the image is determined to be smaller than a preset threshold value (smaller than the threshold value as shown in the figure) according to the verification result of the secondary verification unit, the direct verification passes. Under the condition that the target verification result is that the image to be detected contains badge-like images, inputting the image to be detected into a special scene recognition and exemption unit, classifying and recognizing through a certificate book classifier (a second classification model in the technical scheme of the disclosure), and checking if the image to be detected is the book cover image. If the image is not the book cover image, acquiring the text information in the image to be detected, checking the name of the legal document, and if the image is the legal document image, checking to pass. If the image is not a legal document image, acquiring user qualification information corresponding to the image to be detected, performing qualification authorization verification, if the user uploading the image has checking and exemption qualification, checking to pass, otherwise, determining the image to be detected as a checking and non-passing image.

Fig. 5 is a schematic diagram of a training method of a target detection model according to an embodiment of the disclosure. As shown in fig. 5, the training method of the object detection model may include:

step S501, a plurality of groups of first training sample data are obtained;

Step S502, training the initial target detection model based on the multiple sets of first training sample data until a preset training end condition is satisfied, to obtain a target detection model according to any embodiment of the present disclosure.

According to the training method of the target detection model in the technical scheme, the spliced images obtained by splicing the target object images and the similar target object images are used as training samples, so that the number and richness of the training samples can be increased, model training is performed through the rich training sample images, the model can learn more information, and the detection accuracy of the trained target detection model is improved.

In one embodiment, the stitched image is obtained by:

In one embodiment, the training method of the target detection model further includes:

Acquiring a labeling label of a new spliced image;

Fig. 6 is a schematic diagram of an object detection device according to an embodiment of the disclosure. As shown in fig. 6, the object detection device may include:

a detection image acquisition module 601, configured to acquire an image to be detected;

the target detection module 602 is configured to input an image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

a position frame acquisition module 603, configured to acquire position frame information in the target detection result when the target detection result meets a preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value;

and the result checking module 604 is configured to check the target detection result based on the position frame information, so as to obtain a target check result corresponding to the image to be detected.

In one embodiment, the target detection model is obtained by training an initial target detection model by adopting a plurality of groups of first training sample data, wherein the first training sample data comprises a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used for representing whether the first sample image contains a target object and position frame information of the target object;

The first sample image comprises a stitched image; the spliced image is obtained by splicing a target object image and a similar target image, wherein the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

In one embodiment, the stitched image is obtained by:

Fig. 7 is a schematic diagram of a sample update module according to an embodiment of the disclosure. As shown in fig. 7, in one embodiment, the object detection device further includes a sample update module, and the sample update module includes:

the test result obtaining unit 701 is configured to obtain a plurality of test sample images during training of the initial target detection model, and input the test sample images into the model to obtain an output result corresponding to the test sample images;

An image stitching unit 702, configured to perform stitching on an image corresponding to a result of detecting an error in output results corresponding to the target object image and the test sample image, to obtain a new stitched image, where the result of detecting the error is a result of detecting an image that does not include the target object as including the target object;

a label acquiring unit 703, configured to acquire a label of a new stitched image;

an updating unit 704, configured to update the first training sample data with the new stitched image and the label.

In one embodiment, the result checking module 604 is specifically configured to:

acquiring a feature vector of an image corresponding to the position frame;

the characteristic information includes at least one of:

In one embodiment, the target object comprises a badge class object, and the target detection apparatus further comprises an image classification module for:

In one embodiment, the object detection device further includes a text recognition module for:

In one embodiment, the target detection apparatus further comprises a qualification auditing module for:

In one embodiment, the object detection device further includes a result determination module for:

In the target detection device of the embodiment of the disclosure, in target detection results output by a target detection model, the target detection results which contain position frame information and have confidence coefficient smaller than a preset confidence coefficient threshold value are checked, so that the accuracy of image auditing and the recall rate of images containing target objects are improved.

Fig. 8 is a schematic diagram of a training apparatus for a target detection model according to an embodiment of the disclosure. As shown in fig. 8, the training apparatus for target detection may include:

a sample acquiring module 801, configured to acquire a plurality of sets of first training sample data;

the model training module 802 is configured to train the initial target detection model based on multiple sets of first training sample data until a preset training end condition is met, so as to obtain a target detection model in any embodiment of the disclosure;

In one embodiment, the stitched image is obtained by:

In one embodiment, the training apparatus of the object detection model further includes a sample update module, the sample update module including:

the test result acquisition unit is used for acquiring a plurality of test sample images in the training process of the initial target detection model, inputting the test sample images into the model, and obtaining an output result corresponding to the test sample images;

the image stitching unit is used for stitching the images corresponding to the detection error result in the output results corresponding to the target object image and the test sample image to obtain a new stitched image, wherein the detection error result is a result of detecting the image which does not contain the target object as containing the target object;

the label acquisition unit is used for acquiring the labeling label of the new spliced image;

and the updating unit is used for updating the first training sample data by using the new spliced image and the labeling label.

According to the training device for the target detection model in the technical scheme, the spliced images obtained by splicing the target object images and the similar target object images are used as training samples, so that the number and richness of the training samples can be increased, model training is performed through rich training sample images, the model can learn more information, and the detection accuracy of the target detection model after training is improved.

The functions of each unit, module or sub-module in each apparatus of the embodiments of the present disclosure may be referred to the corresponding descriptions in the above method embodiments, which are not repeated herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input output (I/O) interface 905 is also connected to the bus 904.

A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, such as the target detection method. For example, in some embodiments, the target detection method or training method of the target detection model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above-described object detection method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the target detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of target detection, the method comprising:

acquiring an image to be detected;

acquiring position frame information in the target detection result under the condition that the target detection result meets a preset condition; the preset condition is that the target detection result contains position frame information, and the confidence coefficient corresponding to the target detection result is smaller than a preset confidence coefficient threshold value;

verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected;

the verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected comprises the following steps:

inputting the image corresponding to the position frame into a pre-trained first classification model to obtain a classification result, wherein a target verification result corresponding to the image to be detected comprises the classification result, and the classification result comprises a target object or a target object not;

or alternatively, the process may be performed,

acquiring a feature vector of an image corresponding to the position frame;

searching in a preset search database based on the feature vector to obtain a search result; the retrieval database comprises feature vectors of the target object image;

determining a target verification result corresponding to the image to be detected according to the retrieval result;

if the similarity between the image corresponding to the position frame and the image in the search database is larger than a preset threshold value, the image to be detected contains a target object;

or alternatively, the process may be performed,

the determining the target verification result corresponding to the image to be detected based on the characteristic information comprises the following steps: determining whether the position frame image contains a target object or not according to the characteristic information;

The characteristic information includes at least one of: the shape characteristics of the objects contained in the image, the color characteristics of the objects contained in the image, and the text information in the image.

2. The method according to claim 1, wherein the target detection model is obtained by training an initial target detection model with a plurality of sets of first training sample data, the first training sample data including a first sample image and a first sample tag corresponding to the first sample image, the first sample tag being used for representing whether the first sample image includes a target object and position frame information of the target object;

the first sample image comprises a stitched image; the stitching image is obtained by stitching based on a target object image and a similar target image, the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

3. The method of claim 2, wherein the stitched image is obtained by:

And taking the processed image as a foreground image, taking the similar target image as a background image, and performing splicing processing.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

acquiring a labeling label of the new spliced image;

5. The method of claim 4, wherein the first classification model is obtained by training an initial first classification model with a plurality of sets of second training samples, the second training samples comprising second sample images and second sample labels corresponding to each second sample image; the second sample image comprises an image corresponding to the position frame information of the target object in the first sample image and a position frame image corresponding to the result of the detection error output by the initial target detection model in the training process; the second sample tag is used for representing whether the second sample image contains a target object.

6. The method of claim 1, wherein the target object comprises a badge class object, the method further comprising:

and inputting the image to be detected into a preset second classification model under the condition that the target verification result is that the image to be detected contains badge-like objects, and determining the image to be detected as an audit passing image under the condition that the image to be detected is determined to be a book cover image according to the output result of the second classification model.

7. The method of claim 6, wherein the method further comprises:

and under the condition that the image to be detected is not the book cover image according to the output result of the second classification model, acquiring text information in the image to be detected, and under the condition that the image to be detected is a legal document image according to the text information, determining the image to be detected as a checking passing image.

8. The method of claim 7, wherein the method further comprises:

under the condition that the image to be detected is not a legal document image according to the text information, obtaining user qualification information corresponding to the image to be detected, determining whether a user corresponding to the image to be detected has checking and exempting permission according to the user qualification, and if so, determining the image to be detected as a checking and passing image.

9. The method of claim 8, wherein the method further comprises:

10. A method of training a target detection model, the method comprising:

acquiring a plurality of groups of first training sample data, and training an initial target detection model based on the plurality of groups of first training sample data until a preset training ending condition is met to obtain the target detection model according to any one of claims 1-9;

the first training sample data comprises a first sample image and a first sample label corresponding to the first sample image, wherein the first sample label is used for representing whether the first sample image contains a target object and position frame information of the target object or not, and the first sample image comprises a spliced image; the stitching image is obtained by stitching based on a target object image and a similar target image, the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

11. The method of claim 10, wherein the stitched image is obtained by:

12. The method according to any one of claims 10 or 11, further comprising:

acquiring a labeling label of the new spliced image;

13. An object detection apparatus, the apparatus comprising:

the result checking module is used for checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected;

the result checking module is specifically configured to obtain an image corresponding to a position frame from the image to be detected based on the position frame information; inputting the image corresponding to the position frame into a pre-trained first classification model to obtain a classification result, wherein a target verification result corresponding to the image to be detected comprises the classification result, and the classification result comprises a target object or a target object not; or, obtaining the feature vector of the image corresponding to the position frame; searching in a preset search database based on the feature vector to obtain a search result; the retrieval database comprises feature vectors of the target object image; determining a target verification result corresponding to the image to be detected according to the retrieval result; if the similarity between the image corresponding to the position frame and the image in the search database is larger than a preset threshold value, the image to be detected contains a target object; or, acquiring characteristic information of an image corresponding to the position frame, and determining a target verification result corresponding to the image to be detected based on the characteristic information; the determining the target verification result corresponding to the image to be detected based on the characteristic information comprises the following steps: determining whether the position frame image contains a target object or not according to the characteristic information; the characteristic information includes at least one of: the shape characteristics of the objects contained in the image, the color characteristics of the objects contained in the image, and the text information in the image.

14. The apparatus of claim 13, wherein the target detection model is obtained by training an initial target detection model with a plurality of sets of first training sample data, the first training sample data including a first sample image and a first sample tag corresponding to the first sample image, the first sample tag being used to represent whether the first sample image includes a target object and position frame information of the target object;

15. The apparatus of claim 14, wherein the stitched image is obtained by:

16. The apparatus of claim 14 or 15, further comprising a sample update module, the sample update module comprising:

the image stitching unit is used for stitching the images corresponding to the detection error results in the output results corresponding to the target object images and the test sample images to obtain new stitched images, wherein the detection error results are the results of detecting the images which do not contain the target object as the results containing the target object;

and the updating unit is used for updating the first training sample data by utilizing the new spliced image and the labeling label.

17. The apparatus of claim 16, wherein the first classification model is trained on an initial first classification model using a plurality of sets of second training samples, the second training samples comprising second sample images and second sample labels corresponding to each second sample image; the second sample image comprises an image corresponding to the position frame information of the target object in the first sample image and a position frame image corresponding to the result of the detection error output by the initial target detection model in the training process; the second sample tag is used for representing whether the second sample image contains a target object.

18. The apparatus of claim 17, wherein the target object comprises a badge class object, the apparatus further comprising an image classification module to:

19. The apparatus of claim 18, further comprising a text recognition module for:

20. The apparatus of claim 19, further comprising a qualification audit module for:

21. The apparatus of claim 20, further comprising a result determination module configured to:

22. A training apparatus for a target detection model, the apparatus comprising:

the model training module is used for training the initial target detection model based on the multiple groups of first training sample data until a preset training ending condition is met, so as to obtain the target detection model according to any one of claims 1-9;

23. The apparatus of claim 22, wherein the stitched image is obtained by:

24. The apparatus of any one of claims 22 or 23, further comprising a sample update module comprising:

25. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

26. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-12.