CN116168274A

CN116168274A - Object detection method and object detection model training method

Info

Publication number: CN116168274A
Application number: CN202310282069.7A
Authority: CN
Inventors: 王飞; 孙佰贵; 刘洋; 谢宣松
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-05-26

Abstract

The embodiment of the specification provides an object detection method and an object detection model training method, wherein the object detection method comprises the following steps: acquiring an image to be detected; inputting the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected; the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by pre-training based on a second sample set, the first sample set is different from a scene where a sample image in the second sample set is located, and the sample image comprises a target object; the method realizes that the object detection model improves the object detection precision in the image scene in the first sample set, can adapt to object detection in different scenes, can solve the effect of object authenticity detection in a new scene, improves the precision of object authenticity detection, and improves the effect of object identification.

Description

Object detection method and object detection model training method

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to an object detection method.

Background

With the development of technology, object recognition technology is widely applied to people's daily life due to advantages such as high reliability. While the application range of the object recognition technology is continuously expanded, the attack of the non-authenticity object of the object recognition technology also appears, so that the object authenticity detection is also an important problem in the field of object anti-counterfeiting, and the object authenticity detection directly influences the object recognition experience.

At present, an object authenticity detection algorithm generally extracts texture features of an image from an object image, and then judges a real object and a non-real object through a classifier, and as the conventional algorithm has weak feature characterization capability and is easily influenced by illumination change, the object authenticity detection performance of the system is poor, the anti-counterfeiting effect of the object in a new scene is poor, and the accuracy of object identification is further influenced.

Disclosure of Invention

In view of this, the present embodiment provides an object detection method. One or more embodiments of the present specification relate to another object detection method, an object detection model training method, a computing device, a computer-readable storage medium, and a computer program to solve the technical drawbacks of the prior art.

According to a first aspect of embodiments of the present specification, there is provided an object detection method, including:

acquiring an image to be detected;

inputting the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected;

the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by training a second sample set in advance, the first sample set is different from a scene where a sample image in the second sample set is located, and the sample image comprises a target object.

According to a second aspect of embodiments of the present specification, there is provided another object detection method, applied to a cloud-side device, including:

receiving an object detection request sent by a terminal side device, wherein the object detection request carries an image to be detected;

the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by pre-training based on a second sample set, the first sample set is different from a scene where a sample image in the second sample set is located, and the sample image comprises a target object;

And sending the object detection result to the front end, and displaying the object detection result.

According to a third aspect of embodiments of the present disclosure, there is provided an object detection model training method applied to cloud-side equipment, including:

acquiring a first sample image in a first sample set and a first object tag corresponding to the first sample image;

obtaining model parameters in a reference model;

inputting the first sample image and the model parameters into a detection model of an object to be trained to obtain a first prediction result;

and adjusting model parameters of the object detection model to be trained based on the first prediction result and the first object label to obtain an object detection model.

According to a fourth aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions that, when executed by the processor, implement the object detection method or object detection model training method described above.

According to a fifth aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the above-described object detection method or object detection model training method.

According to a sixth aspect of embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the above-described object detection method or object detection model training method.

In one embodiment of the present disclosure, an image to be detected is obtained; inputting the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected; the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by training a second sample set in advance, the first sample set is different from a scene where a sample image in the second sample set is located, and the sample image comprises a target object.

Specifically, an object detection result corresponding to an image to be detected is obtained by inputting the obtained image to be detected into an object detection model, meanwhile, the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by training based on a second sample set, and scenes where sample images in the two sample sets are located are different; the reference model trained by the sample image of one scene helps to train the object detection model corresponding to the sample image of the other scene, and the object detection model can continuously learn the characteristics of the sample image of the other scene, so that the object detection model improves the object detection precision in the image scene in the first sample set, can adapt to the object detection in different scenes, can solve the effect of object authenticity detection in the new scene, improves the accuracy of object authenticity detection, and improves the effect of object identification.

Drawings

Fig. 1 is a schematic view of a scenario of an object detection method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of an object detection method provided in one embodiment of the present disclosure;

FIG. 3 is a flow chart of another object detection method provided by one embodiment of the present disclosure;

FIG. 4 is a flow chart of an object detection model training method provided in one embodiment of the present disclosure;

FIG. 5 (a) is a schematic diagram illustrating a process of an object detection model training method according to an embodiment of the present disclosure;

FIG. 5 (b) is a data flow diagram of an object detection model training method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

Object authenticity detection: and judging whether the object in one picture is a real object or not.

KMean: given a set of data points and the number of clusters k required, k is specified by the user, and the k-means algorithm repeatedly divides the data into k clusters according to a certain distance function.

KNN: the meaning of K nearest neighbors is that each sample can be represented by its nearest K neighbor values.

Attention: the attention mechanism assigns different weights to different features to help model selection to determine the most valuable features of the class.

Object authenticity detection is an important problem in the field of object anti-counterfeiting, and directly influences the experience of object identification; the object authenticity detector is easily affected by scene changes, for example, a model trained by a batch of pictures acquired outdoors is directly applied to an indoor environment, and thus the authenticity of an object is easily reduced. The object authenticity detection model needs to be continuously adapted under different scenes, and the anti-counterfeiting effect in a new scene is poor, so that the problem that the authenticity cannot be identified and the non-authenticity is easy to identify is caused. Based on the above, in order to improve the accuracy of object authenticity in different fields, the application provides an object authenticity detection method based on cross attention, and provides a module of cross attention, wherein the module of cross attention can continuously learn information in a new scene, enhance the expressive power of features, improve the robustness of an object authenticity detector, and learn cross attention branches through a distillation module to guide the features of a target domain, so that a high anti-counterfeiting effect can be achieved in the new scene. Firstly training a model in an original domain scene, extracting features from samples in a target domain by adopting the trained model, obtaining an initial class center by adopting a kMeans clustering mode, pseudo-tagging samples in the target domain by adopting a kNN neighbor method, and then forming a sample pair by pictures of two domains and sending the sample pair into a cross-attention network structure, so that the network can learn the features of the cross domains, and the network capability is continuously enhanced; the cross attention module can be used for strongly characterizing Ji Yuanyu and the target domain, so that the robustness of the model is effectively improved.

In the present specification, there are provided an object detection method, which is related to another object detection method, an object detection model training method, an object detection apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic view of a scene of an object detection method according to an embodiment of the present disclosure.

The left side in fig. 1 is an object image under each scene type, including an object image 1 of an indoor scene, an object image 2 of an outdoor scene without sunlight, and an object image 3 related to the outdoor scene; the three images are respectively input into the object detection model, so that detection results can be obtained, and the images of the objects 1-3, which are images with real objects, such as images with living bodies of animals and plants, and which are images without real objects, such as plane images of animals and plants (non-living body images of animals and plants), can be obtained.

In order to adapt to object authenticity detection on object images in different scenes, the embodiment provides a cross attention network, so that information in a new scene can be continuously learned, the expression capability of features is enhanced, and the robustness of object authenticity detection is improved; the problem of low object authenticity detection capability in different fields is solved, and under different environments, the object authenticity detection is easily affected by different scenes, such as an indoor scene and an outdoor scene, and different illumination conditions at daytime and night; the object authenticity detection model can obtain higher precision in one scene, but the situation of low passing rate and low authenticity detection capability can occur in another scene.

Referring to fig. 2, fig. 2 shows a flowchart of an object detection method according to an embodiment of the present disclosure, which specifically includes the following steps.

It should be noted that, the object detection method provided in this embodiment may be applied to a scene of object authenticity detection to solve the problem of low object authenticity detection capability in different fields, where in different environments, object authenticity detection is easily affected by different scenes, such as an indoor scene and an outdoor scene, a day and night scene, different lighting conditions, and the like.

Step 202: and acquiring an image to be detected.

The image to be detected is understood to be any image with a target object, wherein the target object includes, but is not limited to, a face, an animal, a plant, an article, and the like, and the image to be detected is any image with a scene, for example, an object image of an indoor scene, an object image of an outdoor scene, and the like.

In practical application, the execution subject can acquire an image to be detected, so that a target object in the subsequent image to be detected can be accurately identified; for example, an image of an object in a sunny scene is acquired, and an object in the image is detected.

Step 204: and inputting the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected.

In practical application, when the image to be detected is obtained, the image to be detected can be used as the input of an object detection model, and the object detection model can output an object detection result corresponding to the image to be detected; it should be noted that the object detection model is obtained by training in advance, specifically, may be obtained by training model parameters in a first sample set and a reference model, and the reference model is obtained by training a second sample set, and the training process may be described in the following embodiments.

Further, the process of pre-training the object detection model can be referred to as follows; specifically, before the image to be detected is input into the object detection model, the method further includes:

Obtaining model parameters in a reference model;

The first sample set may be understood as being composed of a plurality of first sample images and first object tags corresponding to the first sample images; the first sample image can be understood as a corresponding sample image with a target object in a target domain scene to be trained, such as an object image in an outdoor scene; the first object tag may be understood as a detection result tag of the first sample image, for example, a result of detecting the object image in the outdoor scene is an authenticity object or a non-authenticity object; it should be noted that, the target domain may be understood as a scene domain mainly trained by the object detection model, for example, an accurate model for detecting an indoor object image is already available at present, but the model cannot accurately detect an object image in an outdoor scene, so the outdoor scene may be understood as the target domain, and the object detection model is also a model capable of accurately detecting the object image in the outdoor scene after being trained in advance.

The reference model can be understood as a reference model for accurately detecting the object image in the source domain scene, wherein the reference model is obtained by training according to the object image in the source domain scene in advance, and model parameters of the reference model are parameters adjusted by iterative training; the source domain may be understood as any one of the scene domains, and the source domain may be understood as an indoor scene along the above example.

In practical application, a first sample image in a first sample set and a first object tag corresponding to the first sample image can be acquired first; in order to enable the object detection model to learn the image characteristics in the source domain reference model, model parameters in the reference model can be acquired, a first sample image and the model parameters are input into the object detection model to be trained, and a first prediction result is obtained, wherein the first prediction result can be understood as an initial detection result of the object detection model to be trained on the first sample image prediction; further, according to the first prediction result and the first object label, model parameters in the object detection model to be trained are adjusted, so that the training process of the object detection model to be trained is completed, and the object detection model is obtained.

Furthermore, the first object label corresponding to the first sample image may be obtained through a model, or may be obtained through a manual labeling manner, but in order to obtain the label corresponding to the sample image in each scene efficiently, the embodiment provides a manner of constructing a preset feature extraction model; specifically, the acquiring the first object tag corresponding to the first sample image includes:

performing feature extraction on the first sample image based on a preset feature extraction model to obtain a first feature vector;

obtaining a central feature vector corresponding to a second sample set, and calculating a vector distance between the first feature vector and the central feature vector;

based on the vector distance, adjusting parameters in the preset feature extraction model to obtain a target feature extraction model;

and obtaining a first object label corresponding to the first sample image based on the target feature extraction model.

In the specific implementation, a second sample set which is different from the scene where the first sample set is located participates in the process of determining the first object tag, the first sample image can be subjected to feature extraction through a preset feature extraction model to obtain a first feature vector, then a central feature vector corresponding to the second sample set is obtained, and the vector distance between the first feature vector and the central feature vector is calculated; it should be noted that, the central feature vector is a class central vector obtained by clustering all second sample images in the second sample set; further, after the vector distance is obtained, parameters in the preset feature extraction model can be adjusted according to the vector distance, and training can be performed continuously and iteratively to obtain the target feature extraction model, so that a first object label corresponding to the first sample image can be accurately obtained according to the trained target feature extraction model.

In practical application, the method for determining the labels of the sample images in the target domain is to pull the distance between the target domain corresponding to the first sample set and the source domain corresponding to the second sample set, the characteristics of the sample images in the target domain can be extracted through the preset characteristic extraction model, then the class center can be calculated according to the plurality of sample images in the source domain, the characteristic distance between the sample images in the target domain and the class center can be determined, and then the sample images in the target domain are labeled with pseudo labels, so that the preset characteristic extraction model is continuously and iteratively trained, the labels corresponding to the sample images in the target domain are more and more accurate, and the training precision of the object detection model is improved.

It should be noted that, in this embodiment, the Kmeans algorithm may be used to calculate the class center, but this is not limited thereto, and the following formula 1 may be referred to:

where ck is a class center; ft represents the features of image t;

representing the probability that the image t belongs to the k class; the nearest class center is adopted by the KNN strategy to label the image of the target domain. In addition, retrieving a new class center with a new tag can refer to the following equations 2 and 3:

based on the above, the label marked by the trained model is consistent with the label corresponding to the source domain image, and can be considered that the training of the model is finished, and the finally obtained first object label is more accurate.

Specifically, the object detection model to be trained comprises a cross attention layer and a self attention layer; the cross attention layer is mainly used for learning model parameters in other domains and applying the learned model information to own training, and the two word attention mechanisms can well learn the characteristic information between each other; the cross attention layer and the self attention layer are two parallel model processing layers, and the sample images output by the two processing layers are identical and can be understood as the first sample image in the scene of the input target domain.

Correspondingly, the inputting the first sample image and the model parameters into the object detection model to be trained to obtain a first prediction result comprises the following steps:

inputting the first sample image into the self-attention layer to obtain a first image vector;

acquiring intra-domain parameters in the self-attention layer, inputting the first sample image, the intra-domain parameters and the model parameters into the cross-attention layer, and acquiring a second image vector;

and calculating a first target loss value based on the first image vector and the second image vector, and determining a first prediction result according to the first target loss value.

In practical application, the first sample image can be input into the self-attention layer, and the first image vector can be obtained after the self-attention layer is processed; acquiring intra-domain parameters in the self-attention layer, such as Qt, representing a Q vector in an object detection model to be trained in a target domain scene; meanwhile, the first sample image, the intra-domain parameters and the model parameters in the reference model can be input into the cross attention layer to obtain a second image vector; it should be noted that, the model parameters in the reference model may be understood as model parameters of a model in a source domain in a different domain from the target domain, that is, the out-of-domain parameters, such as Ks and Vs, represent the K vector and the V vector in the reference model in the source domain scene. And finally, calculating a first target loss value according to the first image vector and the second image vector, and determining a first prediction result according to the first target loss value.

In the training mode, a self-attention mechanism is adopted to focus on information in a domain, wherein Q, K, V and the information come from one domain, so that a module focuses on the information in the domain; the cross attention mechanism is adopted to focus on the information between domains, so that the cross attention module has strong characteristic expression capability.

Further, considering that the first object label corresponding to the first sample image is obtained by using a model, the embodiment also adopts a distillation loss function to distill the first object label by using a cross attention branch, and performs certain constraint and supervision on the target domain from the category distribution; specifically, the calculating, based on the first image vector and the second image vector, a first target loss value includes:

calculating a first distillation loss value based on the first image vector and the second image vector, and calculating a first reference loss value based on the first image vector and the first object tag;

a first target loss value is determined based on the first distillation loss value and the first reference loss value.

In practical application, a first distillation loss value is calculated based on a first image vector and a second image vector, and supervision of the first image vector is performed through the second image vector; further, a first reference loss value is calculated based on the first image vector and the first object label, and then a first target loss value corresponding to the object detection model to be trained is determined according to the two loss values.

Wherein the distillation loss value can be realized by a loss function, refer to the following formula 4:

The overall loss function is as follows: loss=α×l _s +β*L _t +γ*L _s→t +δ*L _dtl Wherein the sum of α, β, γ, δ is equal to 1; l (L) _s Indicated is a loss of source domain, otherwise similar.

Based on the method, the cross attention branches are used for distillation, certain constraint and supervision are carried out on the target domain from the category distribution, the branches of the target domain are further fused into the information of some source domains, and the characteristic capability of the branches of the target domain is enhanced.

The following embodiment describes the reference model in detail in advance, and it should be noted that, in the embodiment of the present application, the object detection model and the reference model learn the information in the domain mutually, so as to shorten the distance between the target domain feature and the source domain feature, make the network have better feature alignment, and enhance the expression capability of the network; specifically, the reference model is obtained by training in advance based on a second sample set, and comprises the following steps:

acquiring a second sample image in a second sample set and a second object tag corresponding to the second sample image;

obtaining model parameters in the object detection model;

inputting the second sample image and the model parameters into a reference model to be trained to obtain a second prediction result;

and adjusting model parameters of the reference model to be trained based on the second prediction result and the second object label to obtain a reference model.

The second sample set may be understood as being composed of a plurality of second sample images and second object labels corresponding to the second sample images, and the second sample images may be understood as sample images with target objects in a source field scene, such as object images in an indoor scene; the second object tag may be understood as a detection result tag of the second sample image, such as whether the object image in the indoor scene is an authenticity object or a non-authenticity object.

In practical application, in order to train a reference model, a second sample image in a second sample set and a second object label corresponding to the second sample image are acquired first, meanwhile, in order to improve training accuracy of the reference model, model parameters shared by an object detection model can be acquired, and the second sample image and the model parameters in the object detection model are input into the reference model to be trained to obtain a second prediction result; and adjusting model parameters in the reference model to be trained according to the second prediction result and the second object label, and obtaining the reference model after the iterative training is finished.

It should be noted that, the training mode of the reference model may also refer to the parameters trained in the object detection model, so as to realize that the two models can learn effective information mutually, thereby improving the processing precision of the two models.

Furthermore, the reference model to be trained is consistent with the model structure in the object detection model, can comprise a cross attention layer and a self attention layer, can continuously and effectively extract the domain information of the target domain by adopting a cross attention mechanism, and overcomes the defect that the attention mechanism only learns the source domain information; specifically, the reference model to be trained comprises a cross attention layer and a self attention layer;

correspondingly, the inputting the second sample image and the model parameters into the reference model to be trained to obtain a second prediction result includes:

inputting the second sample image into the self-attention layer to obtain a third image vector;

acquiring intra-domain parameters in the self-attention layer, inputting the second sample image, the intra-domain parameters and the model parameters into the cross-attention layer, and acquiring a fourth image vector;

and calculating a second target loss value based on the third image vector and the fourth image vector, and determining a second prediction result according to the second target loss value.

In practical application, the second sample image can be input into the self-attention layer, and a third image vector can be obtained after the self-attention layer is processed; acquiring intra-domain parameters in the self-attention layer, such as Qs, representing a Q vector in a reference model to be trained in a source domain scene; meanwhile, the second sample image, the intra-domain parameters and the model parameters in the object detection model can be input into the cross attention layer to obtain a fourth image vector; it should be noted that, the model parameters in the object detection model may be understood as model parameters of the model in the target domain in a different domain from the source domain, that is, the out-of-domain parameters, such as Kt and Vt, represent the K vector and the V vector in the reference model in the target domain scene. And finally, calculating a second target loss value according to the third image vector and the fourth image vector, and determining a second prediction result according to the second target loss value.

Still further, the calculating a second target loss value based on the third image vector and the fourth image vector includes:

calculating a second distillation loss value based on the third image vector and the fourth image vector, and calculating a second reference loss value based on the third image vector and the second object tag;

a second target loss value is determined based on the second distillation loss value and the second reference loss value.

In practical application, the training process of the reference model can also refer to the mode of training the object detection model, the distillation loss value is adopted, the final target loss value is calculated, a certain constraint and supervision are carried out on the source domain from the category distribution, the branches of the source domain are further enabled to be fused with the information of some target domains, and the characteristic capability of the branches of the source domain is enhanced.

In addition, the embodiment also provides a mode for fine adjustment of model parameters, so that the accuracy of model processing can be further improved; specifically, after the object detection result corresponding to the image to be detected is obtained, the method further includes:

receiving feedback information corresponding to the object detection result sent by a front end;

and adjusting model parameters in the object detection model based on the feedback information.

In practical application, feedback information sent by the front end for the object detection result may be received, for example, the user may score the object detection result at the front end or label whether the result is correct, which is not limited in this embodiment, and further, according to the feedback information, fine adjustment may be performed on model parameters in the object detection model, so as to obtain an object detection model with higher accuracy.

In summary, in order to improve accuracy of object authenticity in different fields, the object detection method provided by the embodiment of the application provides a cross-attention model, and the model can continuously learn information in a new scene, enhance expression capability of features, improve robustness of object authenticity detection, learn cross-attention branches through distillation loss values, and guide features of a target field, so that a high anti-counterfeiting effect can be achieved in the new scene.

Referring to fig. 3, fig. 3 shows a flowchart of another object detection method according to an embodiment of the present disclosure, which specifically includes the following steps.

It should be noted that, the object detection method provided in this embodiment is applied to cloud side devices; for easy understanding, the object detection method is described taking an application scenario of object authenticity detection as an example.

Step 302, receiving an object detection request sent by a terminal side device, wherein the object detection request carries an image to be detected.

In practical applications, after receiving an object detection request sent by an end-side device, the cloud-side device may perform subsequent object detection processing on an image to be detected carried in the object detection request, for example, detect whether a target object in the image to be detected is an authenticity object.

Step 304, inputting the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected; the object detection model is obtained by training model parameters based on a first sample set and a reference model, the reference model is obtained by training a second sample set in advance, the first sample set is different from a scene where a sample image in the second sample set is located, and the sample image comprises a target object.

The first sample set may be understood as being composed of a plurality of first sample images and first object tags corresponding to the first sample images; the first sample image can be understood as a corresponding sample image with a target object in a target domain scene to be trained, such as an object image in an outdoor scene; the target object may be understood as content in the sample image, including but not limited to faces, animals, plants, objects, etc.; the second sample set may be understood as consisting of a number of second sample images and second object labels corresponding to the second sample images, which may be understood as sample images with target objects in a source field scene, such as object images in an indoor scene.

In practical application, when the image to be detected is obtained, the image to be detected can be used as the input of an object detection model, and the object detection model can output an object detection result corresponding to the image to be detected; it should be noted that the object detection model is obtained through pre-training, specifically, may be obtained through model parameter training in the first sample set and the reference model, and the reference model is obtained through training in the second sample set, and it should be noted that the training process may refer to the description of the above embodiment, and will not be repeated herein.

Step 306: and sending the object detection result to the front end, and displaying the object detection result.

In practical application, after the object detection result is obtained, the object detection result can be fed back to the front end for display.

In summary, according to the object detection method provided by the embodiment of the application, through the reference model trained by the sample image of one scene, the object detection model corresponding to the sample image of another scene is helped to be trained, so that the object detection model can continuously learn the characteristics of the sample image of other scenes, the object detection accuracy in the image scene in the first sample set is improved by the object detection model, the object detection in different scenes can be adapted, the effect of object authenticity detection in the new scene can be solved, the accuracy of object authenticity detection is improved, and the effect of object identification is improved.

Referring to fig. 4, fig. 4 shows a flowchart of an object detection model training method according to an embodiment of the present disclosure, which specifically includes the following steps.

It should be noted that, the object detection model training method provided in this embodiment is applied to cloud side devices; after the object detection model is trained, the object detection model can be stored into cloud side equipment, so that the follow-up continuous use of the model is facilitated.

Step 402, acquiring a first sample image in a first sample set and a first object label corresponding to the first sample image.

The first sample set may be understood as being composed of a plurality of first sample images and first object tags corresponding to the first sample images; the first sample image can be understood as a corresponding sample image with a target object in a target domain scene to be trained, such as an object image in an outdoor scene; the first object tag may be understood as a detection result tag of the first sample image, such as an authenticity object or a non-authenticity object as a result of detecting the object image in the outdoor scene.

Step 404, obtaining model parameters in the reference model.

Step 406: and inputting the first sample image and the model parameters into a detection model of the object to be trained to obtain a first prediction result.

Wherein the object detection model to be trained may comprise a cross-attention layer and a self-attention layer.

Step 408: and adjusting model parameters of the object detection model to be trained based on the first prediction result and the first object label to obtain an object detection model.

In practical application, a first sample image in a first sample set and a first object tag corresponding to the first sample image can be acquired first; in order to enable the object detection model to be trained to learn the image features in the source domain reference model, model parameters in the reference model can be acquired, a first sample image and the model parameters are input into the object detection model to be trained, and a first prediction result is obtained, wherein the first prediction result can be understood as an initial detection result of the object detection model to be trained on the first sample image prediction; further, according to the first prediction result and the first object label, model parameters in the object detection model to be trained are adjusted, so that the training process of the object detection model to be trained is completed, and the object detection model is obtained.

In summary, according to the object detection model training method provided by the embodiment of the application, through the reference model trained by the sample image of one scene, the object detection model corresponding to the sample image of another scene is helped to be trained, and the object detection model can be enabled to learn the characteristics of the sample image of other scenes continuously, so that the object detection model improves the object detection precision in the image scene in the first sample set, can adapt to object detection in different scenes, can solve the anti-counterfeiting effect of objects in the new scene, improves the object detection precision, and improves the object recognition effect.

Fig. 5 (a) shows a schematic process diagram of an object detection model training method according to an embodiment of the present disclosure, and fig. 5 (b) shows a schematic data flow diagram of an object detection model training method according to an embodiment of the present disclosure.

Fig. 5 (a) includes two images, namely image 1 and image 2, which are respectively object images in different scene domains, and the cross-attention network is trained in this embodiment to realize that the images in different scene domains can be accurately identified in the network.

Specifically, the image 1 is respectively input into a cross attention layer and a self attention layer corresponding to the image 1, and after feature processing of the two layers, fs vectors and fs-t vectors can be obtained; respectively inputting the image 2 into a cross attention layer and a self attention layer corresponding to the image 2, and obtaining a ft-s vector and a ft vector after the characteristic processing of the two layers; in the training process, the two domains exchange model parameters based on a weight sharing mechanism, so that characteristic information among different domains can be learned. As shown in fig. 5 (b), the self-attention layer uses the intra-domain information of the self-attention layer itself for processing the input contents, while the cross-attention layer uses the extra-domain information and the non-cross-attention layer itself for processing the input contents.

In practical application, a transform structure is adopted, a patch block structure is adopted to be sent into a cross attention module, and a cross attention network comprises two self attention mechanisms and two cross attention mechanisms; the two self-attention mechanisms can well learn the information of the respective domains, and the two cross-attention mechanisms can learn the information of the cross domains; the cross-attention module, for example branches from s- > t, derives the Q component of the source domain from the source domain, the K and V components from the target domain, another mechanism of cross-attention and this symmetry, so that cross-attention learns mutual information from both domains; the cross attention network adopts a weight sharing mechanism, so that the calculated amount of parameters is reduced; the intermediate states of different modes can be learned, and the characteristic alignment capability is strong; the self-attention mechanism is adopted to focus on the information in the domains, Q, K and V, and the information comes from one domain, so that the module focuses on the information in the domain; the cross attention mechanism is adopted to focus on the information between domains, so that the cross attention module has strong characteristic expression capability.

It should be noted that, the information and data such as the image, the model, the sample set and the like in the embodiment of the method are all information and data authorized by the user or fully authorized by each party, and the collection, the use and the processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and are provided with corresponding operation entries for the user to select authorization or rejection.

Corresponding to the above method embodiments, the present disclosure further provides an embodiment of an object detection device, and fig. 6 shows a schematic structural diagram of an object detection device provided in one embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:

an image acquisition module 602 configured to acquire an image to be detected;

the object detection module 604 is configured to input the image to be detected into an object detection model to obtain an object detection result corresponding to the image to be detected;

Optionally, the apparatus further comprises:

the first model training module is configured to acquire a first sample image in a first sample set and a first object label corresponding to the first sample image;

obtaining model parameters in a reference model;

Optionally, the object detection model to be trained includes a cross attention layer and a self attention layer;

optionally, the first model training module is further configured to:

Optionally, the first model training module is further configured to:

Optionally, the apparatus further comprises:

a second model training module configured to:

obtaining model parameters in the object detection model;

Optionally, the reference model to be trained includes a cross-attention layer and a self-attention layer;

optionally, the second model training module is configured to:

Optionally, the second model training module is configured to:

Optionally, the first model training module is further configured to:

Optionally, the apparatus further comprises:

the parameter adjustment module is configured to receive feedback information corresponding to the object detection result sent by the front end;

According to the object detection device provided by the embodiment, the obtained image to be detected is input into the object detection model to obtain the object detection result corresponding to the image to be detected, meanwhile, the object detection model is obtained by training based on model parameters of a first sample set and a reference model, the reference model is obtained by training based on a second sample set, and scenes where sample images are located in the two sample sets are different; the reference model trained by the sample image of one scene helps to train the object detection model corresponding to the sample image of the other scene, and the object detection model can continuously learn the characteristics of the sample image of the other scene, so that the object detection model improves the object detection precision in the image scene in the first sample set, can adapt to the object detection in different scenes, can solve the effect of object authenticity detection in the new scene, improves the accuracy of object authenticity detection, and improves the effect of object identification.

The above is a schematic solution of an object detection apparatus of the present embodiment. It should be noted that, the technical solution of the object detection apparatus and the technical solution of the object detection method belong to the same concept, and details of the technical solution of the object detection apparatus, which are not described in detail, can be referred to the description of the technical solution of the object detection method.

Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an ieee 802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

Wherein the processor 720 is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the object detection method described above.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the object detection method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the object detection method.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the above-described object detection method.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the object detection method belong to the same concept, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the object detection method.

An embodiment of the present specification also provides a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-described object detection method.

The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the object detection method belong to the same conception, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the object detection method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier wave signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. An object detection method, comprising:

acquiring an image to be detected;

2. The method of claim 1, further comprising, prior to inputting the image to be detected into an object detection model:

obtaining model parameters in a reference model;

3. The method of claim 2, the object detection model to be trained comprising a cross-attention layer and a self-attention layer;

4. The method of claim 3, the calculating a first target loss value based on the first image vector and the second image vector, comprising:

5. The method of any of claims 1-4, the reference model being pre-trained based on a second set of samples, comprising:

obtaining model parameters in the object detection model;

6. The method of claim 5, the reference model to be trained comprising a cross-attention layer and a self-attention layer;

7. The method of claim 6, the calculating a second target loss value based on the third image vector and the fourth image vector, comprising:

8. The method of claim 2, wherein the obtaining a first object tag corresponding to the first sample image comprises:

9. The method according to claim 1, further comprising, after the obtaining the object detection result corresponding to the image to be detected:

10. An object detection method applied to cloud side equipment comprises the following steps:

11. An object detection model training method is applied to cloud side equipment and comprises the following steps:

obtaining model parameters in a reference model;

12. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the method of any one of claims 1 to 11.

13. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the method of any one of claims 1 to 11.