CN113221918A

CN113221918A - Target detection method, and training method and device of target detection model

Info

Publication number: CN113221918A
Application number: CN202110540636.5A
Authority: CN
Inventors: 梁晓旭; 张言; 刘星; 邓远达; 胡旭
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-06
Anticipated expiration: 2041-05-18
Also published as: CN113221918B

Abstract

The disclosure provides a target detection method, a training method and a training device of a target detection model, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected; under the condition that the target detection result meets a preset condition, position frame information in the target detection result is obtained; the preset condition is that the target detection result comprises position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold; and checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected. In the technical scheme of the disclosure, in the target detection result output by the target detection model, the target detection result which contains the position frame information and has the confidence coefficient smaller than the preset confidence coefficient threshold value is verified so as to improve the accuracy of image review and the recall rate of the image containing the target object.

Description

Target detection method, and training method and device of target detection model

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to the field of machine learning.

Background

With the rise of electronic commerce, advertising means of merchants are increasingly diversified, and rich picture publicity is an important means for advertising. In the actual advertising material popularization process, the condition that a merchant issues an image containing a specific target in a picture and text propaganda process in a violation mode exists, the image provided in the advertising material of the merchant needs to be audited, and the content auditing scheme in the prior art is low in accuracy and cannot meet actual requirements.

Disclosure of Invention

The disclosure provides a target detection method, a training method, a device, equipment and a storage medium of a target detection model.

According to an aspect of the present disclosure, there is provided an object detection method including:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

under the condition that the target detection result meets a preset condition, position frame information in the target detection result is obtained; the preset condition is that the target detection result comprises position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold;

and checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected.

According to another aspect of the present disclosure, there is provided a training method of an object detection model, including:

acquiring multiple groups of first training sample data, training the initial target detection model based on the multiple groups of first training sample data until a preset training end condition is met, and obtaining the target detection model of any one embodiment of the disclosure;

the first training sample data comprises a first sample image and a first sample label corresponding to the first sample image, the first sample label is used for representing whether the first sample image contains a target object and position frame information of the target object, and the first sample image comprises a spliced image; the spliced image is obtained by splicing a target object image and a similar target image, the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

According to another aspect of the present disclosure, there is provided an object detecting apparatus including:

the detection image acquisition module is used for acquiring an image to be detected;

the target detection module is used for inputting the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

the position frame acquisition module is used for acquiring position frame information in the target detection result under the condition that the target detection result meets a preset condition; the preset condition is that the target detection result comprises position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold;

and the result checking module is used for checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected.

According to another aspect of the present disclosure, there is provided a training apparatus for an object detection model, including:

the sample acquisition module is used for acquiring a plurality of groups of first training sample data;

the model training module is used for training the initial target detection model based on multiple groups of first training sample data until a preset training end condition is met, so that the target detection model of any one embodiment of the disclosure is obtained;

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

The technical scheme of the disclosure solves the problem that the content auditing scheme in the prior art is low in auditing accuracy. According to the target detection method in the technical scheme, in the target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are verified, so that the image auditing accuracy and the image recall rate of the target object are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a target detection method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a process for obtaining training samples of a target detection model according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating an iterative process of a target detection model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image review system according to an embodiment of the disclosure;

FIG. 5 is a schematic diagram of a training method for a target detection model according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram of an object detection apparatus according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a sample update module according to an embodiment of the present disclosure;

FIG. 8 is a diagram illustrating an apparatus for training a target detection model according to an embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing the object detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, when detecting a target of content in an image, the following two methods are generally used:

one way is to perform target region extraction and target search in a sliding window manner. For multi-class target recognition, feature extraction methods such as Histogram of Oriented Gradient (HOG) and Scale-invariant feature transform (SIFT) are adopted to extract features, and traditional Machine learning classifiers based on Adaptive boosting (Adaptive boosting) algorithm and Support Vector Machine (SVM) are trained to determine whether an image contains a target or not. The method for detecting the target has the disadvantages of poor robustness, low identification precision, low speed and difficulty in meeting the requirements of content audit time delay and precision.

The other mode is to detect the target based on a deep learning target detection scheme, collect a large amount of image data containing the target, manually label the detection position information of various targets, identify the region of the target through a trained model, record the position of a regression frame, and determine the detection result through a category threshold. The target detection mode relies on a large amount of label data to train a target detection model, the collection difficulty of the label data is high, and the trained model has low accuracy and recall rate under the scene of extremely sparse risk data through the collected data with single background, so that the content auditing requirements of high recall and high accuracy cannot be met. For some classes of target objects with few daily occurrences, the collectable data can not reach the trainable level of the target detection model at all. In addition, for target objects requiring exemption from audit in a specific scene, actual requirements cannot be met.

In the technical scheme of the disclosure, in the target detection result output by the target detection model, the target detection result which contains the position frame information and has the confidence coefficient smaller than the preset confidence coefficient threshold value is verified so as to improve the accuracy of image review and the recall rate of the image containing the target object. By splicing the target object image and the similar target object image, the obtained spliced image is used as a training sample, and the training sample data of the model can be enriched. And the audit exemption requirement of the image to be detected in a specific scene is met through an audit exemption mechanism.

The execution subject of the present disclosure may be any electronic device, for example, a server or the like. The object detection method in the embodiments of the present disclosure will be described in detail below.

Fig. 1 is a schematic diagram of a target detection method in an embodiment of the present disclosure. As shown in fig. 1, the target detection method may include:

step S101, obtaining an image to be detected;

step S102, inputting an image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

step S103, acquiring position frame information in the target detection result under the condition that the target detection result meets a preset condition; the preset condition is that the target detection result contains position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold.

And step S104, checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected.

According to the target detection method, in the target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are verified, so that the image auditing accuracy and the image recall rate of the target object are improved.

In one embodiment, for example, in an application scenario of image review, the image to be detected may be an image for which content review is required. Any target objects that cannot be passed by the audit may be included in the image, and the target objects may include badge images. In addition, the image to be detected may also be an image in other application scenes that need to be subjected to target detection, which is not limited in this application.

In one embodiment, the target detection model may be any neural network model, including but not limited to:

fast-convolutional neural network (fast RCNN), Single-step multi-box Detector (SSD), and yo (you Only Look one) target detection algorithms.

In one embodiment, the target detection result output by the target detection model may include indication information of whether the target object is included, for example, the target object is not included by numeral 0, and the target object is included by numeral 1. Whether the image to be detected contains the target object can also be represented in other modes, and the object of the application is not limited. And if the target detection result contains the target object, outputting the position frame information of the target object in the image to be detected and the confidence corresponding to the target detection result. The position frame information includes, but is not limited to, coordinate information of the position frame, and the like.

In an embodiment, a confidence threshold is configured in advance according to specific needs, and if the confidence corresponding to the target detection result is smaller than the confidence threshold, an image framed out corresponding to a position frame can be determined in the image to be detected according to the position frame information, verification is performed, and whether the image to be detected contains a target object is further determined, so that the accuracy and the recall rate of target detection are improved.

In one embodiment, the target detection model is obtained by training an initial target detection model by using multiple groups of first training sample data, where the first training sample data includes a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used to represent whether the first sample image includes a target object and position frame information of the target object; the first sample image comprises a stitched image; the spliced image is obtained by splicing a target object image and a similar target image, the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

Where the target object may be a badge-class object, then the similar target object may be a target similar in shape or color to a badge, e.g., a watch, a circular icon, etc. Alternatively, the target object image and the similar target object image may be obtained in a preset database or a commonly used search engine.

In some embodiments, for the target detection model for identifying badge-like targets, when training data is acquired, a large amount of related training data is difficult to acquire for uncommon badge images, and more badge images of the type can be obtained as training data in an image splicing manner.

It is understood that the training sample data of the target detection model may include a large number of types of images such as a target object image, a similar target object image, and the like, in addition to the stitched images.

In the embodiment of the disclosure, the spliced image obtained by splicing the target object image and the similar target object image is used as the training sample, so that the number and the richness of the training samples can be increased, model training is performed through abundant training sample images, and more information can be learned by the model, so that the detection accuracy of the trained target detection model is improved.

In one embodiment, the stitched image is obtained by:

processing the target object image according to a preset mode to obtain a processed image; the preset mode comprises at least one of sharpening, noise adding, filtering, color dithering, random filling and perspective transformation;

and taking the processed image as a foreground image and taking the similar target image as a background image to carry out splicing processing.

The preset mode may also be any other image processing mode, and the present application is not limited to this.

In the embodiment of the disclosure, the target object is processed according to different modes to obtain the image processed by each processing mode, and then the images are spliced, and the spliced image is used as a training sample, so that the number and the richness of the training sample can be increased.

In one embodiment, the target detection method further comprises:

in the training process of the initial target detection model, obtaining a plurality of test sample images, inputting the test sample images into the model, and obtaining output results corresponding to the test sample images;

splicing images corresponding to the result of detection error in the output results corresponding to the target object image and the test sample image to obtain a new spliced image, wherein the result of detection error is the result of detecting the image not containing the target object as containing the target object;

acquiring a label of a new spliced image;

and updating the first training sample data by using the new spliced image and the label.

In the training process of the initial target detection model, a test sample image can be obtained from a preset image database, and the model is input for testing. And splicing the image corresponding to the detection error result output by the model and the target object image by using the image as a background image to obtain a new spliced image, and adding the new spliced image into a training sample database of the model. The model can be a model after the parameters of the initial target detection model are adjusted through continuous iteration in the training process, and the model does not meet the training end condition yet and is not a target detection model after final training is finished.

In some embodiments, the label of the new stitched image may be a label of a manual label or a label obtained in other manners, which is not limited in this application.

In the embodiment of the disclosure, training sample data of the model is increased by splicing the image corresponding to the detection error result corresponding to the test sample image in the model iteration process and the target object image to obtain a new spliced image, so that the amplification of the training sample data can be realized.

In one embodiment, the checking the target detection result based on the position frame information to obtain a target checking result corresponding to the image to be detected includes:

acquiring an image corresponding to the position frame in the image to be detected based on the position frame information;

and inputting the image corresponding to the position frame into a pre-trained first classification model to obtain a classification result, wherein the target verification result corresponding to the image to be detected comprises the classification result.

The first classification model may include, but is not limited to, a deep convolution classification model, a ResNet model, an inclusion model, a VGG model, and other classification models. Optionally, the classification result may include the target object or not include the target object, or may be a probability that includes the target object or not include the target object, which is not limited in the present application. Optionally, the classification result may be used as a final target verification result, so as to determine whether the image to be detected is an approved image.

The image input into the first classification model may be an image framed by a position frame in a detection result corresponding to the image to be detected.

In the embodiment of the disclosure, the position frame images corresponding to the target detection result are classified, so that the purpose of verification is achieved, and the accuracy and the recall rate of target detection are improved.

In one embodiment, the first classification model is obtained by training an initial first classification model by using a plurality of groups of second training samples, and each second training sample comprises a second sample image and a second sample label corresponding to each second sample image; the second sample image comprises an image corresponding to the position frame information of the target object in the first sample image and a position frame image corresponding to a detection error result output by the initial target detection model in the training process; the second sample label is used for characterizing whether the second sample image contains the target object.

In the embodiment of the disclosure, an image corresponding to the position frame information of the target object in the first sample image is used as a positive sample of the first classification model, and a position frame image corresponding to a detection error result output by the initial target detection model in the training process is used as a negative sample to train the first classification model, so that the classification accuracy of the trained first classification model is higher.

acquiring a feature vector of an image corresponding to the position frame;

searching in a preset searching database based on the characteristic vector to obtain a searching result; retrieving a characteristic vector comprising a target object image in a database;

and determining a target verification result corresponding to the image to be detected according to the retrieval result.

In this embodiment, the feature vector of the image corresponding to the position frame may be extracted by a feature extraction module of the first classification model, and the retrieval is performed in the retrieval database by using the feature vector, where the retrieval manner may include, but is not limited to, Approximate Nearest Neighbor search (ANN). The retrieval database is a database which is established in advance and contains the characteristic vectors of all types of target object images. And if the similarity between the retrieved position frame image and the image in the retrieval database is greater than a preset threshold value, the image to be processed is considered to contain the target object.

In the embodiment of the disclosure, whether the image to be detected contains the target object is checked in a retrieval mode in the retrieval database, so that the accuracy of target detection can be improved. Data in the retrieval database in the embodiment of the disclosure can be continuously updated, and the feature vectors of the newly added target object images are continuously added to the retrieval database, so that the recall capability of the newly added target object images is improved.

acquiring characteristic information of an image corresponding to the position frame, and determining a target verification result corresponding to the image to be detected based on the characteristic information;

the characteristic information includes at least one of:

shape features of an object contained in the image, color features of the object contained in the image, and text information in the image.

In the embodiment of the present disclosure, visual features of the position frame image, including but not limited to at least one of shape features of an object included in the image, color features of the object included in the image, and text information in the image, may also be extracted through a computer vision algorithm, and whether the position frame image includes the target object is determined through the features, so as to improve the accuracy of the target detection result.

In the technical scheme of the present disclosure, an audit exemption mechanism may be further included, and whether to perform audit exemption is further determined for an image including a target object in a target verification result, which is specifically described in the following embodiment.

In one embodiment, the target object includes a badge class object, and the target detection method further includes:

and under the condition that the target verification result is that the image to be detected contains badge class objects, inputting the image to be detected into a preset second classification model, and under the condition that the image to be detected is determined to be a book cover image according to the output result of the second classification model, determining the image to be detected as an audit-passed image.

In the embodiment of the present disclosure, in the case where the target object is a badge class object, the second classification model may be a model for classifying the book cover image and the certificate image. And inputting the image to be detected or the position frame image of the image to be detected into a second classification model to determine whether the image is a book cover image, wherein the book cover image can be an image without examination, and if the image contains a badge image, the image can be passed through examination, and the image to be detected is determined to be passed through examination.

In the embodiment of the disclosure, whether the image to be detected can be checked or not is determined through the output result of the second classification model, and the checking requirement on the book cover image can be met.

In one embodiment, the target detection method further comprises:

and under the condition that the image to be detected is determined not to be the book cover image according to the output result of the second classification model, acquiring character information in the image to be detected, and under the condition that the image to be detected is determined to be a legal certificate image according to the character information, determining the image to be detected as an approved image.

In the embodiment of the disclosure, under the condition that it is determined that the image to be detected is not the book cover image according to the output result of the second classification model, the text information in the image to be detected is identified through Optical Character Recognition (OCR) and other modes according to the text information in the image to be detected, and whether the image to be detected is a legal certificate name is determined, so as to determine whether the image to be detected is a legal certificate image, and if the image to be detected is a legal certificate image, the image to be detected is determined as an approved image.

In the embodiment of the disclosure, whether the image to be detected can be checked and exempted is determined in a character recognition mode, and the checking and exempting requirements for legal certificate images can be met.

In one embodiment, the target detection method further comprises:

and under the condition that the image to be detected is determined not to be a legal certificate image according to the character information, obtaining user qualification information corresponding to the image to be detected, determining whether a user corresponding to the image to be detected has an audit exemption authority or not according to the user qualification, and if so, determining the image to be detected as an audit passed image.

In the embodiment of the disclosure, for a specific user, an audit exemption right can be given, and an audit exemption can be performed on an image uploaded by the user. The user qualification information can be bound with the image uploaded by the user, the user qualification information of the user corresponding to the image can be inquired through the identification information of the image and the like, and whether the user corresponding to the image to be detected has the audit exemption authority or not can be determined according to the user qualification information.

In the embodiment of the disclosure, whether the image to be detected can be checked and exempted or not is determined by checking the qualification information of the user, and the checking and exempting requirements on the image uploaded by the user with the checking and exempting permission can be met.

In one embodiment, the target detection method further comprises:

and determining that the user corresponding to the image to be detected does not have the audit exemption authority according to the user qualification, and determining the image to be detected as an audit failed image.

In the embodiment of the disclosure, the image containing the badge class target object sent by the user without the auditing exemption authority is determined as an auditing failed image.

According to the target detection method in the technical scheme, in the target detection results output by the target detection model, the target detection results which contain position frame information and have the confidence coefficient smaller than the preset confidence coefficient threshold value are verified, so that the image auditing accuracy and the image recall rate of the target object are improved.

Fig. 2 is a schematic diagram of a process of obtaining training samples of a target detection model according to an embodiment of the present disclosure. The target object in this embodiment is a badge class object. As shown in fig. 2, an image containing a badge is acquired as a positive sample image, and sample labeling (as "positive sample capture, labeling" shown in the figure) is performed. Acquiring an image similar to the outline of the badge as a similar target image (as shown in the figure, "badge outline similar image capture"); searching images similar to the badge image in a preset database (for example, searching an advertisement material library shown in the figure), splicing the badge image and the images similar to the badge image to obtain a spliced image (for example, generating and amplifying samples shown in the figure), and using the badge image, the images similar to the badge and the spliced image as first training sample data (for example, initial training data shown in the figure).

Fig. 3 is a schematic diagram of an iterative process of a target detection model in an embodiment of the present disclosure. As shown in fig. 3, a target object image and a similar target object image are obtained and stitched, and a sample label of the stitched image is obtained (as shown in "initial labeling and generation data"), the stitched image and the sample label are used as a training data set of a target detection model, and iterative training of the target detection model is performed, in the iterative process of the model, a test sample image is input into the model, a position frame image corresponding to a detection error result output by the model is used as a background image (as shown in "negative case collection") and the target object image are stitched, so as to obtain a new stitched image, the new stitched image is added into the training sample set of the target detection model (as shown in "sample generation and amplification"), the model is trained by using the continuously updated training data set (as shown in "model iteration") until a preset training end condition is met, a target detection model (shown as "model publication") is obtained.

Fig. 4 is a schematic diagram of an image review system according to an embodiment of the disclosure. As shown in fig. 4, in the present embodiment, the target object image is a badge-like image. The image auditing system includes: the system comprises a target detection unit, a secondary verification unit and a special scene identification and exemption unit. The image to be detected is input into a target detection unit, badge detection is carried out through a sensitive badge target detector (which can be a target detection model in the technical scheme of the disclosure), a badge detection result is obtained, if the probability of containing a badge is smaller than a preset threshold value (as shown in the figure, the probability is smaller than the threshold value), the image to be detected does not contain a badge, and the image to be processed is determined to be an approved image. If the probability of containing badges in the badge detection result is greater than or equal to a preset threshold value, it indicates that badges may be contained in the image to be detected, the target detection result contains position frame information, the image to be processed of which the confidence coefficient corresponding to the target detection result is less than the preset confidence coefficient threshold value is verified, the position frame image is extracted from the images, and the position frame image is input into a secondary verification unit for verification, and the specific verification mode can include three types: the first search library search specifically realizes the process that: and extracting the feature vector of the position frame image, compressing the feature vector, searching in a preset search database by using the processed feature vector to obtain a search result, and determining whether the badge is contained according to the search result. The second morphological feature detection is specifically realized by the following steps: feature information extraction is performed on the position frame image, and whether a badge is included in the image is determined according to features such as shape features of an object included in the image, color features of the object included in the image, and character information in the image. The third method determines whether a badge is included in an image according to a classification result of a badge recognition classifier (the first classification model in the technical scheme of the disclosure). And if the probability that the image contains the badge is determined to be smaller than a preset threshold value (shown as being smaller than the threshold value) according to the verification result of the secondary verification unit, directly checking to pass. And under the condition that the target verification result is that the image to be detected contains a badge image, inputting the image to be detected into a special scene recognition and exemption unit, performing classified recognition through a certificate book classifier (a second classification model in the technical scheme of the disclosure), and if the image is a book cover image, checking and passing. If the certificate image is not the book cover image, character information in the image to be detected is obtained, the name of the legal certificate is checked, and if the certificate image is the legal certificate image, the certificate passes the verification. If the certificate image is not legal, user qualification information corresponding to the image to be detected is obtained, qualification authorization verification is carried out, if the user uploading the image has verification exemption qualification, verification is passed, and if the user uploading the image does not have verification exemption qualification, the image to be detected is determined to be a verification-failed image.

Fig. 5 is a schematic diagram of a training method of a target detection model in an embodiment of the present disclosure. As shown in fig. 5, the training method of the target detection model may include:

step S501, acquiring multiple groups of first training sample data;

Step S502, training the initial target detection model based on multiple groups of first training sample data until a preset training end condition is met, and obtaining the target detection model of any embodiment of the disclosure.

According to the training method of the target detection model in the technical scheme, the spliced image obtained by splicing the target object image and the similar target object image is used as the training sample, the number and the richness of the training samples can be increased, model training is performed through abundant training sample images, so that the model can learn more information, and the detection accuracy of the trained target detection model is improved.

In one embodiment, the stitched image is obtained by:

In one embodiment, the training method of the target detection model further includes:

acquiring a label of a new spliced image;

Fig. 6 is a schematic diagram of an object detection apparatus according to an embodiment of the present disclosure. As shown in fig. 6, the object detecting device may include:

a detection image obtaining module 601, configured to obtain an image to be detected;

the target detection module 602 is configured to input the image to be detected into a pre-trained target detection model to obtain a target detection result corresponding to the image to be detected;

a position frame obtaining module 603, configured to obtain position frame information in the target detection result when the target detection result meets a preset condition; the preset condition is that the target detection result comprises position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold;

and a result checking module 604, configured to check the target detection result based on the position frame information, so as to obtain a target checking result corresponding to the image to be detected.

In one embodiment, the target detection model is obtained by training an initial target detection model by using multiple groups of first training sample data, where the first training sample data includes a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used to represent whether the first sample image includes a target object and position frame information of the target object;

the first sample image comprises a stitched image; the spliced image is obtained by splicing a target object image and a similar target image, the target object image is an image containing a target object, the similar target image is an image containing a similar target object, and the similarity between the similar target object and the target object is within a preset range.

In one embodiment, the stitched image is obtained by:

Fig. 7 is a schematic diagram of a sample update module according to an embodiment of the disclosure. As shown in fig. 7, in one embodiment, the target detection apparatus further comprises a sample update module, and the sample update module comprises:

the test result obtaining unit 701 is configured to obtain a plurality of test sample images during training of the initial target detection model, input the test sample images into the model, and obtain an output result corresponding to the test sample images;

an image stitching unit 702, configured to perform stitching processing on an image corresponding to a result of detecting an error in output results corresponding to the target object image and the test sample image to obtain a new stitched image, where the result of detecting the error is a result of detecting an image that does not include the target object as including the target object;

a label obtaining unit 703, configured to obtain an annotation label of the new stitched image;

and an updating unit 704, configured to update the first training sample data with the new stitched image and the label.

In an embodiment, the result checking module 604 is specifically configured to:

acquiring a feature vector of an image corresponding to the position frame;

In an embodiment, the result checking module 604 is specifically configured to:

the characteristic information includes at least one of:

In one embodiment, the target object comprises a badge class object, and the target detection apparatus further comprises an image classification module for:

In one embodiment, the target detection apparatus further comprises a text recognition module for:

In one embodiment, the target detection apparatus further comprises a qualification auditing module configured to:

In one embodiment, the object detection apparatus further comprises a result determination module configured to:

According to the target detection device in the embodiment of the disclosure, in the target detection result output by the target detection model, the target detection result which contains the position frame information and has the confidence coefficient smaller than the preset confidence coefficient threshold is verified, so that the accuracy of image verification and the recall rate of the image containing the target object are improved.

Fig. 8 is a schematic diagram of a training apparatus for a target detection model according to an embodiment of the present disclosure. As shown in fig. 8, the training apparatus for target detection may include:

a sample obtaining module 801, configured to obtain multiple sets of first training sample data;

a model training module 802, configured to train an initial target detection model based on multiple sets of first training sample data until a preset training end condition is met, to obtain a target detection model according to any embodiment of the present disclosure;

In one embodiment, the stitched image is obtained by:

In one embodiment, the training apparatus for the target detection model further includes a sample update module, and the sample update module includes:

the test result acquisition unit is used for acquiring a plurality of test sample images in the training process of the initial target detection model, inputting the test sample images into the model and obtaining output results corresponding to the test sample images;

the image splicing unit is used for splicing images corresponding to the result of detecting errors in the output results corresponding to the target object image and the test sample image to obtain a new spliced image, wherein the result of detecting errors is a result of detecting an image not containing the target object as containing the target object;

the label obtaining unit is used for obtaining a label of the new spliced image;

and the updating unit is used for updating the first training sample data by using the new spliced image and the label.

The training device of the target detection model in the technical scheme of the disclosure, through utilizing the target object image and the similar target object image to splice the concatenation image that obtains as the training sample, can increase the quantity and the richness of training sample, carry out the model training through abundant training sample image, can make the model learn more information to promote the detection accuracy of the target detection model that the training was accomplished.

The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the electronic apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the electronic device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the object detection method. For example, in some embodiments, the object detection method or the training method of the object detection model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the object detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of target detection, the method comprising:

acquiring an image to be detected;

under the condition that the target detection result meets a preset condition, acquiring position frame information in the target detection result; the preset condition is that the target detection result contains position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold;

2. The method according to claim 1, wherein the target detection model is obtained by training an initial target detection model using multiple sets of first training sample data, where the first training sample data includes a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used to characterize whether the first sample image includes a target object and position frame information of the target object;

3. The method of claim 2, wherein the stitched image is obtained by:

and taking the processed image as a foreground image and the similar target image as a background image, and carrying out splicing processing.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

acquiring a label of the new spliced image;

5. The method according to claim 2 or 3, wherein the verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected comprises:

6. The method according to claim 5, wherein the first classification model is obtained by training an initial first classification model using a plurality of sets of second training samples, and the second training samples include second sample images and second sample labels corresponding to the second sample images; the second sample image comprises an image corresponding to position frame information of a target object in the first sample image and a position frame image corresponding to a detection error result output by the initial target detection model in a training process; the second sample label is used for characterizing whether a target object is contained in the second sample image.

7. The method according to any one of claims 1 to 3, wherein the verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected comprises:

acquiring a feature vector of an image corresponding to the position frame;

searching in a preset searching database based on the characteristic vector to obtain a searching result; the retrieval database comprises a characteristic vector of the target object image;

8. The method according to any one of claims 1 to 3, wherein the verifying the target detection result based on the position frame information to obtain a target verification result corresponding to the image to be detected comprises:

the feature information includes at least one of:

9. The method of claim 1, wherein the target object comprises a badge class object, the method further comprising:

and inputting the image to be detected into a preset second classification model under the condition that the target verification result is that the image to be detected contains badge objects, and determining the image to be detected as an approved image under the condition that the image to be detected is determined to be a book cover image according to the output result of the second classification model.

10. The method of claim 9, further comprising:

and under the condition that the image to be detected is determined not to be the book cover image according to the output result of the second classification model, acquiring character information in the image to be detected, and under the condition that the image to be detected is determined to be a legal certificate image according to the character information, determining the image to be detected as an audit pass image.

11. The method of claim 10, further comprising:

and under the condition that the image to be detected is determined not to be a legal certificate image according to the character information, obtaining user qualification information corresponding to the image to be detected, determining whether a user corresponding to the image to be detected has an audit exemption authority or not according to the user qualification, and if so, determining the image to be detected as an audit pass image.

12. The method of claim 11, further comprising:

and determining the image to be detected as an audit-failed image if the user corresponding to the image to be detected does not have the audit-exemption authority according to the user qualification.

13. A method of training an object detection model, the method comprising:

acquiring multiple groups of first training sample data, training an initial target detection model based on the multiple groups of first training sample data until a preset training end condition is met, and obtaining the target detection model of any one of claims 1-12;

14. The method of claim 13, wherein the stitched image is obtained by:

15. The method according to any one of claims 13 or 14, further comprising:

acquiring a label of the new spliced image;

16. An object detection apparatus, the apparatus comprising:

the position frame acquisition module is used for acquiring position frame information in the target detection result under the condition that the target detection result meets a preset condition; the preset condition is that the target detection result contains position frame information, and the confidence corresponding to the target detection result is smaller than a preset confidence threshold;

17. The apparatus according to claim 16, wherein the target detection model is obtained by training an initial target detection model using multiple sets of first training sample data, where the first training sample data includes a first sample image and a first sample label corresponding to the first sample image, and the first sample label is used to characterize whether the first sample image includes a target object and location box information of the target object;

18. The apparatus of claim 17, wherein the stitched image is obtained by:

19. The apparatus of claim 17 or 18, further comprising a sample update module, the sample update module comprising:

the image splicing unit is used for splicing images corresponding to the result of error detection in the output results corresponding to the target object image and the test sample image to obtain a new spliced image, wherein the result of error detection is the result of detecting the image not containing the target object as containing the target object;

the label obtaining unit is used for obtaining the label of the new spliced image;

and the updating unit is used for updating the first training sample data by utilizing the new spliced image and the label.

20. The apparatus according to claim 17 or 18, wherein the result checking module is specifically configured to:

21. The apparatus of claim 5, wherein the first classification model is obtained by training an initial first classification model using a plurality of sets of second training samples, and the second training samples include second sample images and second sample labels corresponding to the second sample images; the second sample image comprises an image corresponding to position frame information of a target object in the first sample image and a position frame image corresponding to a detection error result output by the initial target detection model in a training process; the second sample label is used for characterizing whether a target object is contained in the second sample image.

22. The apparatus according to any one of claims 16 to 18, wherein the result verification module is specifically configured to:

acquiring a feature vector of an image corresponding to the position frame;

23. The apparatus according to any one of claims 16 to 18, wherein the result verification module is specifically configured to:

the feature information includes at least one of:

24. The apparatus of claim 16, wherein the target object comprises a badge class object, the apparatus further comprising an image classification module to:

25. The apparatus of claim 24, further comprising a text recognition module configured to:

26. The apparatus of claim 25, further comprising a qualification audit module configured to:

27. The apparatus of claim 26, further comprising a result determination module configured to:

28. An apparatus for training an object detection model, the apparatus comprising:

a model training module, configured to train an initial target detection model based on the multiple sets of first training sample data until a preset training end condition is met, so as to obtain the target detection model according to any one of claims 1 to 12;

29. The apparatus of claim 28, wherein the stitched image is obtained by:

30. The apparatus of any one of claims 28 or 29, further comprising a sample update module, the sample update module comprising:

31. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.

32. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-15.

33. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-15.