CN116110000A

CN116110000A - Sample data generation method, object detection method and related equipment

Info

Publication number: CN116110000A
Application number: CN202211636734.XA
Authority: CN
Inventors: 余正法; 周祥明; 章合群; 傅凯; 白家男; 赵志伟; 肖丰
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-17
Filing date: 2022-12-17
Publication date: 2023-05-12

Abstract

The application discloses a sample data generation method. The method comprises the steps of obtaining an extended object picture and a background picture, wherein the extended object picture is obtained by extending a real object picture through a diffusion model, the real object picture is obtained by shooting a target object, further the extended object picture is added into the background picture to obtain a map image, and the map image is subjected to harmony processing to obtain a target object sample, wherein the target object sample is used for training a detection model of the target object. The application also discloses a detection method of the object, electronic equipment and a storage medium. According to the method, the extended object picture is added to the background picture to obtain the map image, and the map image is subjected to harmony processing, so that a large amount of target object sample data are obtained, and the target object sample is used for training a target object detection model.

Description

Sample data generation method, object detection method and related equipment

Technical Field

The disclosed embodiments of the present application relate to the field of target detection technology, and more particularly, to a sample data generating method, an object detection method, and related devices.

Background

The training of the detection model of the object requires a large amount of object sample data, and the large amount of object sample data is generally obtained through a large amount of acquisition mode, but the large amount of acquisition mode has certain defects, so that the large amount and diversity of the sample data cannot be ensured, and the existing detection model is low in precision, poor in robustness and the like.

In addition, a large amount of object sample data can be obtained by processing a small amount of sample data, for example, the small amount of sample data is expanded, but at present, most of the expansion of the small amount of sample data is still performed by adopting a matting and mapping mode, so that the obtained data is limited and most of data distribution conditions in a real scene cannot be simulated.

Among them, a large amount of object sample data plays a more important role in practical applications, such as detection of garbage bags. In recent years, garbage classification fixed-point delivery has become a mainstream treatment mode of existing garbage, and garbage classification can reduce pollution to the environment caused by disordered delivery of garbage, and meanwhile, can avoid causing various diseases to spread. However, the situation that the garbage is discarded at will still exists, if the related personnel do not notice and clean the garbage in time, peculiar smell can be generated and bacteria can be bred, the protection of ecological environment is not facilitated, the current detection model can be used for acquiring the target garbage position to inform the related personnel to clean, but because the sample data of the training model are good and bad, the accuracy of the existing detection model is low and the robustness is poor, and a large amount of effective sample data is still needed to train the detection model.

Disclosure of Invention

According to an embodiment of the application, a sample data generating method, an object detecting method and related equipment are provided, so as to obtain a large number of effective data samples, and train an object detecting model by using the large number of effective data samples.

The first aspect of the application discloses a sample data generating method, comprising: obtaining an extended object picture and a background picture, wherein the extended object picture is obtained by expanding a real object picture through a diffusion model, and the real object picture is obtained by shooting a target object; adding the expansion object picture into the background picture to obtain a map image; and carrying out harmony processing on the map image to obtain a target object sample, wherein the target object sample is used for training a detection model of the target object.

In some embodiments, obtaining the extended object picture includes: acquiring a first mask image corresponding to the target object from the real object picture; inputting the first mask image into a diffusion model to obtain at least one second mask image; and selecting the at least one second mask image as the expansion object picture.

In some embodiments, the inputting the first mask image into a diffusion model to obtain at least one second mask image includes: and adding noise into the first mask image by using the diffusion model to obtain a noise image, and denoising the noise rise image to obtain the at least one second mask image.

In some embodiments, the background picture is provided with at least one map area; the adding the extended object picture to the background picture to obtain a map image includes: and adding the expansion object picture into one of the mapping areas of the background picture to obtain the mapping image.

In some embodiments, the plurality of map areas in the background picture are distributed in areas with different depths of field; the adding the extended object picture to one of the map areas of the background picture to obtain the map image includes: selecting a region to be mapped from a plurality of mapping regions of the background picture; scaling the expansion object picture according to scaling parameters matched with the region to be mapped to obtain a scaled picture; and adding the scaled object picture into a region to be mapped of the background picture to obtain the mapped image.

In some embodiments, the harmonizing the map image to obtain a target object sample includes; determining a foreground part and a background part in the map image, wherein the foreground part is used for representing a target object, and the background part is used for representing a scene where the target object is located; carrying out harmonious operation on the foreground part and a harmonious foreground mask map to obtain a harmonious foreground part, and carrying out harmonious operation on the background part and the harmonious background mask map to obtain a harmonious background part, wherein the harmonious background mask map is obtained by reversing the harmonious foreground mask map; and generating the target object sample according to the harmonious foreground part and the harmonious background part.

In some embodiments, the harmonising the foreground portion with the harmonised foreground mask map to obtain a harmonised foreground portion, and the harmonising the background portion with the harmonised background mask map to obtain a harmonised background portion, includes: and inputting the map image and the foreground part into a preset generator model to obtain the harmonious foreground part and the harmonious background part.

The second aspect of the application discloses a method for detecting an object, which comprises the following steps: acquiring video stream data, wherein the video stream data comprises at least one target object; inputting the video stream data into a preset detection model to detect the at least one target object and detection information thereof; filtering the at least one target object by using the detection information of the at least one target object; wherein the preset detection model is trained using a set of target object samples, at least one target object sample of the set of target object samples being obtained by the method according to any one of the first aspects.

In some embodiments, the target object is a packaging garbage bag; the detection information of the at least one packaging garbage bag comprises identification information and position information of a first packaging garbage bag; the filtering the at least one target object by using the detection information of the at least one target object includes: judging whether the first packed garbage bags are detected in continuous multi-frame data in the video stream data according to the identification information of the first packed garbage bags; in response to detecting the first packed garbage bags in continuous multi-frame data in the video stream data, judging whether the position information of the first packed garbage bags is changed or not; filtering the first baled waste bag from the at least one baled waste bag in response to a change in the positional information of the first baled waste bag.

In some embodiments, the filtering the at least one target object using the detection information of the at least one target object further includes: determining that the first packed garbage bag is in a static state in response to the position information of the first packed garbage bag not being changed, and acquiring an intersection ratio between the first packed garbage bag and a non-packed garbage bag detected from the video stream data; and responsive to the intersection ratio being greater than a preset value, filtering the first baled waste bag from the at least one baled waste bag.

A third aspect of the present application discloses an electronic device, including a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory, to implement the sample data generating method described in the first aspect or the object detecting method described in the second aspect.

A fourth aspect of the present application discloses a non-transitory computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the sample data generating method described in the first aspect or the object detection method described in the second aspect.

The beneficial effects of this application are: the method comprises the steps of obtaining an extended object picture and a background picture, wherein the extended object picture is obtained by extending a real object picture through a diffusion model, the real object picture is obtained by shooting a target object, further, the extended object picture is added into the background picture to obtain a map image, and harmony processing is carried out on the map image so as to obtain a large number of effective target object samples, and further, the target object samples are used for training a detection model of the target object.

Drawings

The application will be further described with reference to the accompanying drawings and embodiments, in which:

FIG. 1 is a flow chart of a sample data generation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a mask image according to an embodiment of the present application;

FIG. 3 is a schematic diagram showing the effect of positive sample data according to an embodiment of the present application;

FIG. 4 is a schematic background view at camera angle of an embodiment of the present application;

FIG. 5 is a flow chart of a method of detecting an object according to an embodiment of the present application;

FIG. 6 is a logical schematic diagram of target object filtering according to one embodiment of the present application

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic structural view of a nonvolatile computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The term "and/or" in this application is merely an association relation describing an associated object, and indicates that three relations may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. Furthermore, the terms "first," "second," and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In order to enable those skilled in the art to better understand the technical solutions of the present application, the technical solutions of the present application are described in further detail below with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, fig. 1 is a flowchart of a sample data generating method according to an embodiment of the present application. The execution subject of the method can be an electronic device with a computing function, such as a microcomputer, a server, a mobile device such as a notebook computer, a tablet computer, and the like.

It should be noted that, if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in fig. 1.

In some possible implementations, the method may be implemented by a processor invoking computer readable instructions stored in a memory, as shown in fig. 1, and may include the steps of:

s11: and acquiring an extended object picture and a background picture, wherein the extended object picture is obtained by expanding a real object picture through a diffusion model, and the real object picture is obtained by shooting a target object.

The real object picture is obtained by shooting a target object, for example, the target object may be an object X, for example, a packaging garbage bag, etc., and further the real object picture, that is, a picture containing the object X, may be obtained from shooting data, for example, video data obtained by a monitoring camera. Further, the real object picture is expanded through the diffusion model to obtain an expanded object picture, that is, the diffusion model is utilized to expand the real object picture to obtain at least one expanded object picture, for example, the real object picture is a picture of the object X, and the diffusion model is utilized to expand the picture of the object X to obtain a picture of the new object X. And acquiring an extended object picture and a background picture, namely acquiring the extended object picture by using a diffusion model based on a real object picture, and acquiring the background picture according to the real object picture, wherein the real object picture can be a picture of an object X in a section of video data, the object X is taken as a foreground, and the environment where the object X is positioned can be taken as the background picture.

In practical application, the extended object picture can be obtained from a corresponding material library, for example, when the target object is a packaging garbage bag, a garbage material library can be pre-established, and the picture of the packaging garbage bag can be obtained from the garbage material library.

S12: and adding the expansion object picture into the background picture to obtain a map image.

The real object picture is expanded through the diffusion model to obtain an expanded object picture, for example, the real object picture can be a picture A of the object X, and the picture of the object X is expanded through the diffusion model to obtain an expanded object picture, namely, the picture B of the object X. The extended object picture may be a picture B of at least one object X, for example, a picture B1, a picture B2, a picture B3, etc., the background picture of the object X is a picture C of the environment where the object X is located, and the extended object picture is added to the background picture, for example, the picture B1 is added to the picture C to obtain a map D1, and the picture B2 is added to the picture C to obtain a map D2 … …, so that a plurality of map images can be obtained.

S13: and carrying out harmony processing on the map image to obtain a target object sample, wherein the target object sample is used for training a target object detection model.

Taking the picture a of the object X as a real object picture, the background picture of the object X is a picture C of the environment where the object X is located, and the picture of the object X is expanded by a diffusion model to obtain expanded object pictures of the object X, for example, a picture B1, a picture B2, a picture B3, and the like, and the expanded object pictures are added to the background picture, for example, the picture B1 is added to the picture C to obtain a map D1, and the picture B2 is added to the picture C to obtain a map D2, and the like. And carrying out harmony processing on the map images, namely carrying out harmony processing on the obtained maps D1, D2, D3 and the like through a preset means, so as to obtain target object samples, wherein the target samples can be data samples meeting the training requirements of the detection model, and therefore the training of the detection model of the target object can be carried out.

In this embodiment, an extended object picture and a background picture are obtained, wherein the extended object picture is obtained by extending a real object picture through a diffusion model, the real object picture is obtained by shooting a target object, the extended object picture is added into the background picture to obtain a map image, and the map image is subjected to harmony processing, so that a large number of effective and diversified target object samples are obtained, and further the target object samples are used for training a detection model of the target object, which is beneficial to improving the accuracy and robustness of the detection model of the target object.

In some embodiments, obtaining the extension object picture includes: acquiring a first mask image corresponding to a target object from a real object picture; inputting the first mask image into a diffusion model to obtain at least one second mask image; at least one second mask image is selected as an expansion object picture.

The first mask image corresponding to the target object is obtained from the real object picture, for example, the real object picture is a picture of a packed garbage bag, that is, the target object is a packed garbage bag, and at this time, the first mask image corresponding to the target object may be a contour image of the packed garbage bag, as shown in fig. 2, and fig. 2 is a schematic diagram of the mask image in an embodiment of the present application. The first mask image is input into the Diffusion Model to obtain at least one second mask image, for example, a plurality of positive sample data may be generated based on a Diffusion Model, that is, a plurality of positive samples of packed garbage are generated in a simulation scene, as shown in fig. 3, fig. 3 is an effect schematic diagram of the positive sample data in an embodiment of the present application, and further, mask images of a plurality of packed garbage in the simulation scene are obtained as the second mask image. At least one second mask image is selected as an expansion object picture, for example, mask images of a plurality of packaging garbage in a simulated scene are acquired as expansion object pictures.

The first mask image corresponding to the target object and at least one second mask image obtained by inputting the first mask image into the diffusion model can be used for constructing a material library corresponding to the target object. For example, when the target object is a packed garbage bag, its corresponding first mask image and at least one second mask image may be used to construct a garbage material library of the packed garbage bag.

In some embodiments, inputting the first mask image into the diffusion model to obtain at least one second mask image includes: and adding noise into the first mask image by using the diffusion model to obtain a noise image, and denoising the noise rise image to obtain at least one second mask image.

The first mask image is input into the diffusion model, that is, the first mask image is input into the diffusion model for training to obtain at least one second mask image, for example, the first mask image of the picture of the target packed garbage area is input into the diffusion model for training, so that at least one second mask image corresponding to the picture of the target packed garbage bag can be obtained.

Specifically, a standard gaussian distribution Z is set by adding noise to the first mask image using a diffusion model to obtain a noise image, i.e., accumulating the gaussian noise in the image _t N (0, 1), which is an important step in constructing training sample GT, assuming real picture x ₀ ～q(x ₀ ) The forward process adds Gaussian noise to the picture through T times of accumulation to obtain x ₀ ,x ₁ ……x _T Super-parameters of distribution variance of gaussian noise added each time

It should be noted that in this process there is a relationship between the distribution of q (t) and the distribution of q (t-1), i.e., q (t) is obtained by multiplying q (t-1) by noise. The whole process can be regarded as a markov process, i.e. formula (1) as follows:

wherein as T gets larger, picture x _t Is more and more closely distributed to pure noise, when T-infinity, x _t Becomes completely Gaussian noise, and additionally, attention is paid to beta in the above formula _t As t increases, i.e. beta ₁ <β ₂ ……<β _t . The correlation proof indicates x with heavy parameters and any time in the previous process _t Can be represented by x _t And beta _t The nature of the representation provides a basis for reverse inference modeling.

Denoising the noisy image using a diffusion model, i.e. denoising the resulting data, i.e. from the T-th pass of the distribution q (x _t |x _t-1 ) The middle step is reversed to obtain the completely standard Gaussian distribution X _t N (0, 1), the original pattern distribution x can be restored ₀ Fitting such a reverse distribution P using deep learning to construct a correlation network _θ ：

P _θ (x _t-1 |x _t )＝N(x _t-1 ；μ _θ (x _t ,t),∑ _θ (x _t ,t) (3)

Formula (2) is the T-th P _θ Distribution. Can be derived from x by formula (3) _t Gradually push forward to find x ₀ I.e. x _t Push back to x _t-1 From x _t-1 Push back to x _t-3 And gradually denoising.

In some embodiments, the background picture is provided with at least one map area; adding the extended object picture to the background picture to obtain a map image, wherein the method comprises the following steps of: and adding the expansion object picture into one of the map areas of the background picture to obtain a map image.

The background picture is taken as a picture C of the environment where the object X is located, and is provided with at least one mapping area, namely, the picture C is divided into areas, such as a C1 area, a C2 area, a C3 area and the like. The extended object picture is added to one of the tile areas of the background picture to obtain a tile image, for example, the extended object picture of the object X is a picture B, i.e., the picture B is added to the C1 area of the background picture C to obtain a tile image D1, the picture B is added to the C2 area of the background picture C to obtain a tile image D2, and the picture B is added to the C3 area of the background picture C to obtain a tile image D3.

In some embodiments, there are multiple map areas in the background picture, and the multiple map areas in the background picture are distributed in areas with different depths of field; adding the extended object picture to one of the map areas of the background picture to obtain a map image, wherein the method comprises the following steps: selecting a region to be mapped from a plurality of mapping regions of the background picture; scaling the expansion object picture according to scaling parameters matched with the region to be mapped to obtain a scaled picture; and adding the scaled object picture into a region to be mapped of the background picture to obtain a mapped image.

Continuing to describe the background picture as the picture C of the environment where the object X is located, the multiple mapping areas in the background picture are distributed in areas with different depths of field, i.e. the multiple mapping areas of the picture C are distributed in areas with different depths of field from the camera lens, such as a far area, a middle near area and a near area, where the far area is the area farthest from the camera. Selecting a region to be mapped from a plurality of mapping regions of a background picture, and scaling the expansion object picture according to a scaling parameter matched with the region to be mapped, wherein the scaling parameter matched with the region to be mapped refers to that under the angle of a camera, the random scaling ratio of the region expansion object picture which is closer to the region is larger, the random scaling ratio of the region expansion object picture which is farther from the region is smaller, for example, the random scaling ratio of the expansion object picture which is mapped by selecting a far region mapping region is smaller than that of the region which is closer to the region, and further the expansion object picture is processed according to the corresponding scaling ratio so as to obtain the scaled picture. Further, the scaling object picture is added to the to-be-mapped area of the background picture to obtain a mapped image, as shown in fig. 4, fig. 4 is a background schematic diagram under the camera angle according to an embodiment of the present application, where the camera corresponds to different scaling parameters in the areas with different depths of field, for example, a far area 41 scaling a3, a middle near area 42 scaling a2, and a near area 43 scaling a1 are set, where a1 > a2 > a3.

In some embodiments, the harmony processing is performed on the map image to obtain a target object sample, including; determining a foreground part and a background part in the map image, wherein the foreground part is used for representing a target object, and the background part is used for representing a scene where the target object is located; carrying out harmonious operation on the foreground part and a harmonious foreground mask map to obtain a harmonious foreground part, and carrying out harmonious operation on the background part and the harmonious background mask map to obtain a harmonious background part, wherein the harmonious background mask map is obtained by inverting the harmonious foreground mask map; a target object sample is generated from the harmonised foreground portion and the harmonised background portion.

By adding the extended object picture to the background picture, a map image is obtained, and a foreground part and a background part in the map image are determined, wherein the foreground part is used for representing the target object, the background part is used for representing the scene where the target object is located, for example, the foreground part is used for representing a packaging garbage bag, the background part is used for representing the street environment where the packaging garbage bag is located, namely, for example, the image after the map is determined to be I, and the foreground part is I _f Background part is I _b Foreground part I _f For indicating packaging refuse bags, background part I _b Used for representing the scene where the packaging garbage bag is located. Harmony operation is carried out on the foreground part and the harmonious foreground mask diagram to obtain a harmonious foreground part, for example, the foreground part I _f Harmony operation is performed with the harmony foreground mask map M, and harmony operation is performed with the background part and the harmony background mask map to obtain harmony backgroundPart, e.g. background part I _b Harmony operation is carried out with the harmonious background mask map, wherein the harmonious background mask map is obtained by inverting the harmonious foreground mask map, namely the harmonious background mask map is

Generating a target object sample, such as a corresponding packed garbage sample, according to the harmonious foreground part and the harmonious background part to obtain a final composite image I _C Wherein the harmony process can be described as

And then generates a corresponding packed garbage sample.

In some embodiments, harmonising the foreground portion with the harmonised foreground mask map to obtain a harmonised foreground portion, and harmonising the background portion with the harmonised background mask map to obtain a harmonised background portion, comprises: the map image and the foreground portion are input to a preset generator model to obtain a harmonised foreground portion and a harmonised background portion.

Foreground part I _f Harmony operation is carried out on the harmony foreground mask image M and the background part and the harmony background mask image, so as to obtain a harmony background part, and the background part I _b Harmony operation is performed with the harmonious background mask map, wherein the harmonious background mask map is the difference between 1 and the harmonious foreground mask map, i.e. the harmonious background mask map is

And then the final composite image I is obtained _C . Inputting any of the map images and foreground portions into a preset generator model to obtain a harmonised foreground portion and a harmonised background portion, wherein the preset generator model can be defined to obtain an image generator model G (I _C M), the optimization function of which is f= |g (I) _C M) -I, i.e. input of the map over a networkMask image M of image and its foreground, mask image M is resized to the corresponding +.>

Size of feature map, ++>

Upsampling the corresponding feature map for the ith time of the network to mask image for each upsampling layer +.>

Wherein, the RAIN module is used for acting on

In the i-th upsampling layer output feature map is +.>

The RAIN module is based on the following principle:

wherein operator N is a normalization operator, gamma ⁱ And beta ⁱ Normalized statistical data is transmitted from the normalization of background features to foreground features for the mean and variance of the background, and finally

Expressed as:

wherein->

And->

For the foreground in the i layer up-samplingThe mean and variance of the traces are determined,

and->

The mean and variance of the background channel at the i-th layer.

Referring to fig. 5, fig. 5 is a flow chart of a method for detecting an object according to an embodiment of the present application. The execution subject of the method can be an electronic device with a computing function, such as a microcomputer, a server, a mobile device such as a notebook computer, a tablet computer, and the like.

It should be noted that, if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in fig. 5.

In some possible implementations, the method may be implemented by a processor invoking computer readable instructions stored in a memory, as shown in fig. 5, and may include the steps of:

s51: video stream data is acquired, wherein the video stream data comprises at least one target object.

The video stream data may be video data acquired by a camera, for example, may be a surveillance video image, where the video stream data includes at least one target object, for example, the target object may be a packaging garbage bag, i.e., the target object may be a packaging garbage bag in the video stream data, and the packaging garbage bag may be a garbage bag that is discarded at will, or may be a bag that is taken on a motor vehicle, a non-motor vehicle, or a bag that is taken in a person's hand, for example, when at least one target object is identified in a surveillance video, the target object may be a garbage bag that is discarded at will, or may be a bag that is taken on a motor vehicle, a non-motor vehicle, or a bag that is taken in a person's hand, instead of a garbage bag.

S52: and inputting the video stream data into a preset detection model to detect at least one target object and detection information thereof.

The method comprises the steps of inputting video stream data into a preset detection model, namely inputting the acquired data stream data into the preset detection model to detect at least one target object and detection information thereof, namely detecting and identifying at least one target object in the video stream data to acquire the target detection information, for example, detecting at least one packaging garbage bag in the video stream data, and acquiring identification information, position information, frame number information and the like of the packaging garbage bag.

S53: the at least one target object is filtered using the detection information of the at least one target object.

At least one target object is detected from the video stream data, for example, the target object may be a packaging garbage bag, and the at least one target object is filtered by using detection information of the at least one target object, that is, the motion track of the target object and the like can be analyzed and judged according to detection information such as identification information, position information, frame number information and the like of the at least one target object, so as to judge whether the detection target is a target packaging garbage belt or not and filter out the detection target which is not the target packaging garbage bag.

The method comprises the steps of detecting at least one target object and detection information thereof in video stream data through a preset detection model, wherein the preset detection model is obtained by training a target object sample set, and at least one target object sample in the target object sample set is obtained through the sample data generation method, and is not described in detail herein.

In some embodiments, the target object is a packaging garbage bag; the detection information of the at least one packaging garbage bag comprises identification information and position information of the first packaging garbage bag.

When the target object is a packaging garbage bag, the at least one target object and the detection information thereof may be at least one packaging garbage bag and the detection information thereof, and the detection information of the at least one packaging garbage bag includes identification information and position information of a first packaging garbage bag, that is, ID (target detection frame) and position information of the first packaging garbage bag, where the first packaging garbage bag may be a packaging garbage bag falling outside the garbage can.

At this time, filtering the at least one target object using the detection information of the at least one target object includes: judging whether the first packed garbage bags are detected in continuous multi-frame data in the video stream data according to the identification information of the first packed garbage bags; in response to detecting the first packed garbage bags in continuous multi-frame data in the video stream data, judging whether the position information of the first packed garbage bags is changed or not; the first bagging garbage bag is filtered from the at least one bagging garbage bag in response to a change in the positional information of the first bagging garbage bag.

The method comprises the steps of obtaining at least one packaging garbage bag of video stream data and detection information thereof by detecting the video stream data, wherein the detection information of the at least one packaging garbage bag can comprise identification information and position information of a first packaging garbage bag, namely, the identification information and the position information of the detected packaging garbage bag 1 can be recorded in the same video stream picture, namely, the ID (identification frame number) and the motion trail of the packaging garbage bag 1 can be obtained. And filtering the at least one target object by utilizing the detection information of the at least one target object, namely judging whether the first packed garbage bag is detected in the continuous multi-frame data in the video stream data according to the identification information of the first packed garbage bag, for example, locking the ID of the packed garbage bag 1, and judging whether the packed garbage bag 1 is stably detected in the continuous T-frame data in the video stream data. In response to detecting the first bagging garbage bag in the continuous multi-frame data in the video stream data, determining whether the position information of the first bagging garbage bag changes, for example, if the bagging garbage bag 1 is stably detected in the continuous T-frame data in the video stream data, determining whether the position information of the bagging garbage bag 1 changes, for example, whether the coordinate information of the bagging garbage bag 1 changes may be determined. In response to a change in the position information of the first packing bag, for example, the position information of the packing bag 1 is changed, it is determined that the packing bag is in a moving state, and then the first packing bag is filtered from at least one packing bag, for example, the ID of the packing bag 1 and other detection information of the packing bag 1 are removed from the candidate packing bag queue.

In some embodiments, filtering the at least one target object with the detection information of the at least one target object further comprises: determining that the first packed garbage bag is in a static state in response to the position information of the first packed garbage bag not being changed, and acquiring an intersection ratio between the first packed garbage bag and a non-packed garbage bag detected from video stream data; and responsive to the intersection ratio being greater than the preset value, filtering the first bagging garbage bag from the at least one bagging garbage bag.

In response to the position information of the first packing garbage bag not changing, it is determined that the first packing garbage bag is in a stationary state, for example, the position information of the packing garbage bag 2 is not changing, and then it is determined that the destination packing garbage bag 2 is in a stationary state. And acquiring the intersection ratio (IOU, intersection over Union) between the first packed garbage bag and the non-packed garbage bag detected from the video stream data, namely calculating the intersection ratio A between the person and the packed garbage bag 2, the intersection ratio B between the motor vehicle and the packed garbage bag 2 or the intersection ratio C between the non-motor vehicle and the packed garbage bag 2 in a static scene.

Further, in response to the intersection ratio being greater than a preset value, filtering the first packed garbage bag from at least one packed garbage bag, for example, if the intersection ratio a between the person and the packed garbage bag 2 is greater than the preset value, determining that the packed garbage bag is in a person carrying state, and further removing the ID of the packed garbage bag 2 and other detection information of the packed garbage bag 2 from the candidate packed garbage bag queue; or the intersection ratio B between the motor vehicle and the packed garbage bag 2 is larger than a preset value, judging that the packed garbage bag is in the carrying state of the motor vehicle, and further eliminating the ID of the packed garbage bag 2 and other detection information of the packed garbage bag 2 from the candidate packed garbage bag queue; or the intersection ratio C between the non-motor vehicle and the packed garbage bag 2 is larger than a preset value, the packed garbage bag is judged to be in the carrying state of the non-motor vehicle, and then the ID of the packed garbage bag 2 and other detection information of the packed garbage bag 2 are removed from the candidate packed garbage bag queue.

For easy understanding, the above determination logic for filtering at least one object by using the detection information of at least one object is described in detail, as shown in fig. 6, and fig. 6 is a logic schematic diagram of object filtering according to an embodiment of the present application.

Step S61: video stream data is acquired.

Video stream data is acquired, wherein the video stream data comprises at least one target object representing at least one packaging garbage bag.

Step S62: and counting the detected target objects and the detection information thereof.

The detection identifies at least one target object to obtain the target object detection information, which may be, for example, identification information, position and frame number information of the detection target object, and the like.

Step S63: it is determined whether the target object is detected in successive multi-frame data in the video stream data.

If no target object is detected in the continuous multi-frame data in the video stream data, one of the judgment results of step S63, step S62 is continued.

Step S64 is executed in response to the detection of the target object in each of the consecutive multi-frame data in the video stream data as a second determination result of step S63.

Step S64: it is determined whether the position information of the target object is changed.

If the position information of the target object changes as a result of the determination in step S64, step S641 is performed.

Step S641: the target object is filtered from the at least one target object, i.e. the target object is filtered from the at least one target object, e.g. the ID of the target object 1 and other detection information of the target object 1 are removed from the candidate target object queue.

Step S65 is executed if the position information of the target object is not changed as a result of the determination in step S64.

Step S65: and judging whether the intersection ratio between the target object and the non-target object detected in the video stream data is larger than a preset value.

One of the judgment results of step S65, if the intersection ratio between the target object and the non-target object detected in the video stream data is greater than the preset value, step S651 is executed.

Step S651: the target object is filtered from the at least one target object, i.e. the target object is filtered from the at least one target object, e.g. the ID of the target object 2 and other detection information of the target object 2 are removed from the candidate target object queue.

Step S66 is executed if the intersection ratio between the target object and the non-target object detected in the video stream data is smaller than the predetermined value as a second judgment result in step S65.

Step S66: an alarm target object, for example, an alarm target object 3, i.e., a packaged garbage which notifies the relevant person to go to process meeting the condition is output.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 70 comprises a memory 71 and a processor 72 coupled to each other, the processor 72 being adapted to execute program instructions stored in the memory 71 to implement the steps of the sample data generating method embodiments described above, or to implement the steps of the object detection method described above. In one particular implementation scenario, electronic device 70 may include, but is not limited to: the microcomputer and the server are not limited herein.

Specifically, the processor 72 is configured to control itself and the memory 71 to implement the steps of the sample data generating method embodiment described above, or to implement the steps of the object detecting method described above. The processor 72 may also be referred to as a CPU (Central Processing Unit ), and the processor 72 may be an integrated circuit chip with signal processing capabilities. The processor 72 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 72 may be commonly implemented by an integrated circuit chip.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a non-volatile computer readable storage medium according to an embodiment of the present application. The non-transitory computer readable storage medium 80 is for storing a computer program 801, which when executed by a processor, for example, the processor 72 in the above-described fig. 7 embodiment, is for implementing the steps of the sample data generating method embodiment described above, or for implementing the steps of the object detection method described above.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in this application, it should be understood that the disclosed methods and related devices may be implemented in other ways. For example, the above-described embodiments of related devices are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication disconnection between the illustrated or discussed elements may be through some interface, indirect coupling or communication disconnection of a device or element, electrical, mechanical, or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those skilled in the art will readily appreciate that many modifications and variations are possible in the device and method while maintaining the teachings of the present application. Accordingly, the above disclosure should be viewed as limited only by the scope of the appended claims.

Claims

1. A sample data generation method, comprising:

obtaining an extended object picture and a background picture, wherein the extended object picture is obtained by expanding a real object picture through a diffusion model, and the real object picture is obtained by shooting a target object;

adding the expansion object picture into the background picture to obtain a map image;

and carrying out harmony processing on the map image to obtain a target object sample, wherein the target object sample is used for training a detection model of the target object.

2. The method of claim 1, wherein the method comprises the steps of,

acquiring the extended object picture, including:

acquiring a first mask image corresponding to the target object from the real object picture;

inputting the first mask image into a diffusion model to obtain at least one second mask image;

and selecting the at least one second mask image as the expansion object picture.

3. The method of claim 2, wherein said inputting the first mask image into a diffusion model to obtain at least one second mask image comprises:

and adding noise into the first mask image by using the diffusion model to obtain a noise image, and denoising the noise rise image to obtain the at least one second mask image.

4. The method according to claim 1, wherein the background picture is provided with at least one map area;

the adding the extended object picture to the background picture to obtain a map image includes:

and adding the expansion object picture into one of the mapping areas of the background picture to obtain the mapping image.

5. The method of claim 4, wherein the background picture has a plurality of map areas, and wherein the plurality of map areas in the background picture are distributed in areas of different depths of field;

the adding the extended object picture to one of the map areas of the background picture to obtain the map image includes:

selecting a region to be mapped from a plurality of mapping regions of the background picture;

Scaling the expansion object picture according to scaling parameters matched with the region to be mapped to obtain a scaled picture;

and adding the scaled object picture into a region to be mapped of the background picture to obtain the mapped image.

6. The method of claim 1, wherein the harmonizing the map image to obtain a target object sample includes;

determining a foreground part and a background part in the map image, wherein the foreground part is used for representing a target object, and the background part is used for representing a scene where the target object is located;

carrying out harmonious operation on the foreground part and a harmonious foreground mask map to obtain a harmonious foreground part, and carrying out harmonious operation on the background part and the harmonious background mask map to obtain a harmonious background part, wherein the harmonious background mask map is obtained by reversing the harmonious foreground mask map;

and generating the target object sample according to the harmonious foreground part and the harmonious background part.

7. The method of claim 6, wherein the harmonising the foreground portion with the harmonised foreground mask map to obtain a harmonised foreground portion and the harmonising the background portion with the harmonised background mask map to obtain a harmonised background portion, comprises:

And inputting the map image and the foreground part into a preset generator model to obtain the harmonious foreground part and the harmonious background part.

8. A method of detecting an object, comprising:

acquiring video stream data, wherein the video stream data comprises at least one target object;

inputting the video stream data into a preset detection model to detect the at least one target object and detection information thereof;

filtering the at least one target object by using the detection information of the at least one target object;

wherein the preset detection model is trained with a set of target object samples, at least one target object sample of the set of target object samples being obtained by the method according to any one of claims 1-7.

9. The method of claim 8, wherein the target object is a packaging garbage bag;

the detection information of the at least one packaging garbage bag comprises identification information and position information of a first packaging garbage bag;

the filtering the at least one target object by using the detection information of the at least one target object includes:

Judging whether the first packed garbage bags are detected in continuous multi-frame data in the video stream data according to the identification information of the first packed garbage bags;

in response to detecting the first packed garbage bags in continuous multi-frame data in the video stream data, judging whether the position information of the first packed garbage bags is changed or not;

filtering the first baled waste bag from the at least one baled waste bag in response to a change in the positional information of the first baled waste bag.

10. The method of claim 6, wherein filtering the at least one target object using the detection information of the at least one target object further comprises:

determining that the first packed garbage bag is in a static state in response to the position information of the first packed garbage bag not being changed, and acquiring an intersection ratio between the first packed garbage bag and a non-packed garbage bag detected from the video stream data;

and responsive to the intersection ratio being greater than a preset value, filtering the first baled waste bag from the at least one baled waste bag.

11. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the sample data generating method of any one of claims 1 to 7 or the object detection method of any one of claims 8 to 10.

12. A non-transitory computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the sample data generating method of any of claims 1 to 7 or the object detection method of any of claims 8 to 10.