WO2023066373A1 - 确定样本图像的方法、装置、设备及存储介质 - Google Patents

确定样本图像的方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023066373A1
WO2023066373A1 PCT/CN2022/126678 CN2022126678W WO2023066373A1 WO 2023066373 A1 WO2023066373 A1 WO 2023066373A1 CN 2022126678 W CN2022126678 W CN 2022126678W WO 2023066373 A1 WO2023066373 A1 WO 2023066373A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
frame
false positive
target object
Prior art date
Application number
PCT/CN2022/126678
Other languages
English (en)
French (fr)
Inventor
郭阶添
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023066373A1 publication Critical patent/WO2023066373A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of data processing, and in particular to a method, device, device, and storage medium for determining a sample image.
  • the neural network model In the process of reasoning the image by the neural network model, it is necessary to analyze the target object in the image, and at the same time continuously update the model itself. With the update of the neural network model, the neural network model may forget certain types of target objects learned before, and when these types of target objects need to be analyzed again, the problem of low analysis accuracy may occur. Therefore, it is necessary to re-determine the sample image for the target object that is easy to forget, so as to facilitate the retraining of the neural network model in the later stage, thereby improving the accuracy of the neural network model analysis.
  • Embodiments of the present application provide a method, device, device, and storage medium for determining sample images, which can solve the problem in related technologies that the training value of sample images is low and the performance of neural network models is relatively limited. Described technical scheme is as follows:
  • a method for determining a sample image comprising:
  • the first image is an image of an object with false negative in the video stream, and the false negative
  • the reported object is the target object of missing analysis or the target object of misanalysis
  • the first image and the plurality of second images are determined as sample images.
  • an apparatus for determining a sample image comprising:
  • the first determination module is configured to perform a false alarm analysis on the video stream to determine a first image and a false alarm result corresponding to the first image, and the first image is a false alarm object in the video stream
  • the image of the false positive object is the target object of missing analysis or the target object of misanalysis
  • An acquisition module configured to acquire a plurality of foreground images and a plurality of background images from the video stream based on the false positive result corresponding to the first image
  • a fusion module configured to fuse the plurality of foreground images and the plurality of background images to obtain a plurality of second images
  • a second determining module configured to determine the first image and the plurality of second images as sample images.
  • a computer device is provided, the computer device is a camera or a server, the computer device includes a memory and a processor, the memory is used to store computer programs, and the processor is used to execute the A stored computer program to implement the steps of the above-mentioned method for determining a sample image.
  • a computer-readable storage medium wherein a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned method for determining a sample image are realized.
  • a computer program product containing instructions, and when the instructions are run on a computer, the computer is made to execute the steps of the above-mentioned method for determining a sample image.
  • a false positive analysis is performed on the video stream to determine the first image and the false positive result corresponding to the first image.
  • the first image and the plurality of second images are determined as sample images.
  • more and more valuable sample images can be generated.
  • Using these sample images to train the first neural network model can improve the training effect of the first neural network model, thereby effectively improving the first neural network model. Analytical performance of neural network models.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a sample image determination device provided in an embodiment of the present application.
  • Fig. 3 is a flow chart of a method for determining a sample image provided by an embodiment of the present application
  • Fig. 4 is a schematic structural diagram of a device for determining a sample image provided by an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of a camera provided in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the first neural network model In order to analyze the target object, it is usually necessary to train the first neural network model through sample images, and then use the trained first neural network model to analyze the target object in the video stream captured by the camera. However, in the process of using the first neural network model to analyze the target object, the first neural network model will be continuously updated in stages. With the update of the first neural network model and the learning of the first neural network model With the increase of target object types, the first neural network model may forget certain types of target objects that have been learned before, and when these types of target objects need to be analyzed again, the analysis accuracy may not be high. question. In layman's terms, due to the continuous updating of the first neural network model, the characteristics of certain types of target objects may be forgotten, so that these target objects cannot be analyzed, or the analysis accuracy of these target objects is low.
  • the first neural network model is used to identify the characteristics of the target object in the four seasons of the year. After the first neural network model analyzes the characteristics of the target object in the four seasons of spring, summer, autumn and winter, If the target object needs to be analyzed again in spring, because the first neural network model will be updated with the characteristics of the target object in the four seasons, the characteristics of the target object in spring may be forgotten. When the target object is analyzed in spring When analyzing, it will cause the problem of low analysis accuracy.
  • the first neural network model In the process of analyzing the target object through the first neural network model, it is also possible to determine and store the sample image according to the method provided in the embodiment of the present application based on the video stream shot by the camera, that is, to analyze the target object through the first neural network model While the object is being analyzed online, it is also possible to determine and store sample images. Therefore, when analyzing the target object again, the first neural network model can be trained again based on the sample image, thereby improving the analysis accuracy of the first neural network model.
  • FIG. 1 is a schematic diagram of an implementation environment according to an exemplary embodiment.
  • the implementation environment includes at least one camera 101 (the at least one camera is schematically represented by a camera in FIG. 1 ) and a server 102.
  • the camera 101 can communicate with the server 102.
  • the communication connection can be a wired connection or a wireless connection. The embodiment of the application does not limit this.
  • the camera 101 is used to shoot video streams, and transmit the captured video streams to the server 102 .
  • a first neural network model is deployed in the server 102, and the target object in the video stream transmitted by the camera 101 is analyzed through the first neural network model. Meanwhile, the server 102 may also determine and store sample images based on the video stream transmitted by the camera 101 .
  • the server 102 may not be included.
  • the camera 101 can be used to shoot a video stream, and according to the method provided in the embodiment of the present application, the target object in the video stream is analyzed, and then the sample image is determined and stored.
  • the camera 101 may process the video stream to determine the sample image, or the server 102 may process the video stream to determine the sample image.
  • the camera or the server may be referred to as a sample image determining device.
  • the device for determining a sample image may include a false positive analysis module, a space-time mining module, a storage module, and a sample generation module.
  • the false alarm analysis module is used to determine the image of the object with false alarm from the video stream as the first image in the process of analyzing the target object, and at the same time determine the false alarm result corresponding to the first image; spatio-temporal mining The module is used to obtain multiple foreground images and multiple background images from the video stream based on the false positive results corresponding to the first image;
  • the storage module is used to store the first image, and the obtained multiple foreground images and multiple background images
  • the sample generating module is used to fuse the stored foreground images and background images to obtain multiple second images, and then determine the first image and the multiple second images as sample images.
  • the storage module may not store multiple foreground images and multiple background images, but after mining multiple foreground images and multiple background images, the multiple foreground images and multiple The background images are fused to obtain multiple second images, and then the storage module stores the multiple second images.
  • the storage module can also store multiple foreground images and multiple background images, and simultaneously store multiple second images.
  • other images may be additionally generated based on the multiple foreground images and the multiple background images for training the first neural network model.
  • the camera 101 may be any device with a camera function, for example, a smart phone, a digital camera, a pan-tilt monitoring device, and the like.
  • the server 102 may be one server, or a server cluster composed of multiple servers, or a cloud computing service center.
  • Fig. 3 is a flow chart of a method for determining a sample image provided by an embodiment of the present application, and the method is applied to a server. Please refer to Figure 3, the method includes the following steps:
  • Step 301 Analyze the video stream for false alarms to determine the first image and the false alarm results corresponding to the first image.
  • the first image is an image with a false positive object in the video stream, and the false negative object is The analyzed target object or the misanalyzed target object.
  • a plurality of first neural network models and second neural network models can be used to analyze any frame of images in the video stream for false positive analysis, so as to determine whether there are false positives in any frame of images object, and the second neural network model is a model capable of analyzing all target objects in any frame of image. If there is a false positive object in the any frame image, then use the any frame image as the first image, and determine the false positive result corresponding to the first image.
  • the foregoing video stream may be a video stream captured by any one of the at least one camera.
  • the video stream may contain multiple frames of images, for any frame of image, it can be determined whether there is a false alarm object in the image according to the above method, and then it is determined whether to use the image as the first image.
  • a plurality of first images and the false alarm result corresponding to each first image can be obtained.
  • a first image is taken as an example for description.
  • the false positive result may include the position information of at least one false positive object in the first image, and the acquisition time of the first image.
  • the false negative result may also include other information, which is not discussed in this embodiment of the present application. Do limited.
  • the missed and false positive object can be the target object of the missed analysis, and can also be the target object of the wrong analysis, so for the at least one missed and false positive object, the at least one missed and false positive object can include the missed
  • the analyzed target objects may also include misanalyzed target objects.
  • the multiple missed and false positive objects may include both missed analyzed target objects and misanalyzed target objects.
  • the multiple first neural network models can all analyze the target object, and the multiple first neural network models can be network models with different structures, or network models with the same structure, and each The first neural network model is a model trained on a data set.
  • the multiple first neural network models can be trained through different data sets; when the structures of the multiple first neural network models are different, the multiple first neural network models can be trained Multiple first neural network models can be trained through the same data set, or can be trained through different data sets.
  • the second neural network model is trained through a variety of public data sets.
  • the second neural network model is a model that can analyze all target objects in any frame of image, and the second neural network model can basically identify For all target objects, for example, the second neural network model is an open-set recognition model.
  • any frame image in the video stream is analyzed for false alarms, so as to determine whether there is an object for false alarms in any frame image.
  • the implementation process includes : Determining the first analysis results corresponding to the any frame of images through a plurality of first neural network models to obtain a plurality of first analysis results, the first analysis results including the position of at least one target object in the any frame of images information and the first label for each target object.
  • a second analysis result corresponding to any frame of image is determined through a second neural network model, and the second analysis result includes position information of at least one target object in any frame of image and a second label of each target object. Based on the location information and the first label of the target object included in the plurality of first analysis results, and the location information and the second label of the target object included in the second analysis result, determine whether there is False positive objects.
  • each of the first neural network models in the plurality of first neural network models determines the first analysis result corresponding to any frame of image in the same way, the following description will take one of the first neural network models as an example . That is, take any frame of image as an input of the first neural network model, and obtain the position information of at least one target object output by the first neural network model, and the probability that each target object belongs to multiple labels. For each target object in the at least one target object, determine the maximum probability from the probabilities that the target object belongs to multiple tags, and use the tag corresponding to the maximum probability as the first tag corresponding to the target object.
  • the position information of the at least one target object in the any frame of image and the first label corresponding to each target object can be obtained, that is, the first analysis result corresponding to the any frame of image.
  • a plurality of first analysis results can be obtained.
  • the implementation process of determining the second analysis result corresponding to any frame of image through the second neural network model includes: taking any frame of image as the input of the second neural network model, and obtaining the output of the second neural network model The location information of at least one target object, and the probability that each target object belongs to multiple tags. For each target object in the at least one target object, the maximum probability is determined from the probabilities that the target object belongs to multiple tags, and the tag corresponding to the maximum probability is used as the second tag corresponding to the target object. At this time, the position information of the at least one target object in the any frame of image and the second label corresponding to each target object can be obtained, that is, the second analysis result corresponding to the any frame of image.
  • the implementation process of missing false positive objects may be as follows: if the first labels of the target objects included in the multiple first analysis results are the same as the second labels of the target objects included in the second analysis results, and the multiple first If the intersection ratio between the position information of the target object included in the analysis result and any two position information of the target object included in the second analysis result is greater than the preset ratio threshold, then it is determined that any frame of image There is no missing or false positive object in any frame image, otherwise, it is determined that there is a missing or false positive object in any frame image.
  • the intersection ratio between the position information of the target object included in the plurality of first analysis results and any two position information of the target object included in the second analysis result is greater than the preset ratio threshold, which is It means that the location information of the target object included in the first analysis result overlaps with the location information of the target object included in the second analysis result.
  • the preset ratio threshold can be set according to actual needs, such as 90%, 85%, 80% or 70%.
  • the location information and the first label of the target object included in the multiple first analysis results are the same, and are identical to the location information and the second label of the target object included in the second analysis result If they are the same, it is determined that there is no missing or false positive object in any frame of image, otherwise, it is determined that there is any missing or false positive object in any frame of image.
  • the two At least one target object included in the first analysis result may or may not be the same, and the first tags corresponding to the same target object may or may not be the same.
  • the second analysis result corresponding to any frame of image is determined through the second neural network model, the at least one target object included in the second analysis result is the same as the at least one target object included in any of the above-mentioned first analysis results. may be the same or different, and the first tags corresponding to the same target object may be the same or different.
  • the second neural network model can identify all target objects in any image, and by comparing the multiple first analysis results with the second analysis results, it can be determined whether any frame image includes False positive objects.
  • the multiple first analysis results include the location information and the first label of the target object, determine Find out whether the target object is a target object that has not been analyzed, if the location information and the first label of the target object are included in the multiple first analysis results, then determine that the target object is not a target object that has not been analyzed; otherwise, determine that the target object is not a target object that has not been analyzed.
  • the target object is the target object of the leak analysis.
  • the target object is not a target object that has not been analyzed
  • many images in the video stream may have false positive objects.
  • the score of each image in these images may be determined based on the false positive objects in these images, and then A partial image is selected as the first image according to the score.
  • the score of each image may be determined based on the number and importance of missed and false positive objects, and then an image with a score higher than a score threshold may be selected as the first image. It can be understood that the score of each image may also be determined in other manners, and the first image may also be selected in other manners.
  • Step 302 Obtain multiple foreground images and multiple background images from the video stream based on the false positive result corresponding to the first image.
  • the false positive result includes position information of at least one false positive object in the first image, and the acquisition time of the first image.
  • at least one spatial range may be determined based on the location information of the at least one false positive object, and the at least one spatial range corresponds to the at least one false positive object.
  • a time range is determined based on the acquisition time of the first image, and multiple foreground images and multiple background images are acquired from the video stream based on the time range and the at least one spatial range.
  • a first time and a second time may be determined, wherein the first time is located before the acquisition time of the first image and is separated from the acquisition time of the first image by a first duration, and the second time is located at the After the acquisition time of the first image and there is a second time interval between the acquisition time of the first image.
  • the time range between the first time and the second time is determined as the time range corresponding to the first image.
  • first duration and the second duration can be set in advance, and can be adjusted according to different requirements.
  • the first duration and the second duration may or may not be equal.
  • the geometric center of the false alarm object can be determined, and then from the first image, the The geometric center of the object is the center of the circle and includes a circular area of the false positive object, and the circular area is regarded as a spatial range.
  • An area including the at least one false positive object may also be determined from the first image, and the area may be used as a spatial range corresponding to the first image.
  • the geometric centers of these false negative objects may be determined. Then, from the first image, a circular area with the geometric center as the center and including the false positive objects is determined, and the circular area is used as the spatial range corresponding to the first image.
  • the implementation process of acquiring multiple foreground images and multiple background images from the video stream includes: acquiring from the video stream that the acquisition time is within the time range and the location is within the at least one spatial range Inner and foreground image regions to obtain multiple image regions. Images whose collection time is within the time range and without foreground are acquired from the video stream to obtain a plurality of third images. Clustering the plurality of image regions to obtain a plurality of first clustering results, and clustering the plurality of third images to obtain a plurality of second clustering results. A plurality of image regions are selected from the plurality of first clustering results as a plurality of foreground images, and a plurality of third images are selected from the plurality of second clustering results as a plurality of background images.
  • clustering refers to the process of classifying according to the similarity of image features. After clustering, the similarity of image features in the same clustering result should be as large as possible, and the difference of image features in different clustering results should be as large as possible. Possibly large. Therefore, after clustering multiple image regions, the same first clustering result includes image regions with similar image features, and different first clustering results include image regions with dissimilar image features. Similarly, after clustering multiple third images, the same second clustering result includes third images with similar image features, and different second clustering results include third images with dissimilar image features.
  • the first neural network model will process any frame of image in the video stream, the first neural network model will extract image features of each image when analyzing the target object.
  • the image feature may be a feature output by the first neural network model, or may be a re-extracted feature, which is not limited in this embodiment of the present application.
  • the same first clustering result includes image regions with similar image features, and different first clustering results include image regions with dissimilar image features.
  • one or more image regions can be selected from each first clustering result as multiple foreground images.
  • the same second clustering result includes third images with similar image features, and different second clustering results include third images with dissimilar image features, In order to ensure richness and diversity of sample images, one or more third images may be selected from each second clustering result as multiple background images.
  • a time range can be determined according to the above method, and based on at least one omission
  • the location of the reported object determines at least one spatial range.
  • the foreground image where multiple target objects with possibly forgotten features are located can be determined through the time range and the at least one spatial range.
  • a plurality of background images can also be determined based on the time range and the at least one spatial range, so that more sample images can be obtained after the subsequent image fusion, which fully ensures that the sample images The richness of , makes the determined sample images more valuable.
  • Step 303 Fusion the plurality of foreground images and the plurality of background images to obtain a plurality of second images.
  • the images correspond to the acquisition time and image position respectively.
  • Semantic segmentation is performed on the plurality of background images respectively, and semantic segmentation information corresponding to the plurality of background images is determined.
  • the plurality of foreground images and the plurality of background images are fused based on the acquisition times and image positions respectively corresponding to the plurality of foreground images, and the acquisition times and semantic segmentation information respectively corresponding to the plurality of background images.
  • a foreground image is a partial image area in a frame of image in the video stream
  • the acquisition time of a frame of image where the foreground image is located in the video stream can be determined, and when the foreground image is The position in one frame of image, and then determine the acquisition time as the acquisition time corresponding to the foreground image, and determine the position of the foreground image in the one frame of image as the image position corresponding to the foreground image.
  • a background image is a frame of image in the video stream. Therefore, after the background image is determined, the acquisition time of a frame of the background image in the video stream can be determined, and then the acquisition time can be determined as The acquisition time corresponding to the background image.
  • the background image includes a plurality of pixels, each pixel in the background image can be segmented according to the different semantics expressed by using the semantic segmentation algorithm, Semantic segmentation information of the background image is obtained, and the semantic segmentation information includes semantic information of different regions in the background image, for example, semantic information such as sky and grass.
  • the implementation process of fusing the plurality of foreground images and the plurality of background images includes: For any foreground image in the plurality of foreground images, at least one background image whose acquisition time is within the same time period as the acquisition time of the foreground image is selected from the plurality of background images.
  • the target background image Based on the image position corresponding to the foreground image and the semantic segmentation information of the at least one background image, determine the target background image from the at least one background image, and fuse the foreground image with the target background image, so that the foreground image is in the target
  • the position in the background image is the image position corresponding to the foreground image.
  • the same time period refers to the same period of time in the same season, for example, daytime in spring, evening in spring, daytime in winter, night in winter, and so on.
  • the background image may include semantically segmented areas such as sky, grass, and rivers
  • the foreground image may not be suitable for being located in some semantically segmented areas, for example, a person is not suitable for being located in the sky. Therefore, the semantic segmentation information of the semantic segmentation area that is allowed to appear in the object included in the foreground image may be determined, so as to obtain at least one target semantic segmentation information. Then, from the at least one background image, select a background image whose semantic segmentation information of the semantic segmentation area at the image position corresponding to the foreground image is any target semantic segmentation information, and use the selected background image as the target background image.
  • the semantic segmentation information of the semantic segmentation area that is allowed to appear for different types of objects can be stored in advance.
  • the category of the object included in the foreground image can be determined, and then the semantic segmentation area that is allowed to appear for the object can be determined. semantic segmentation information.
  • the parts of the object in the foreground image may also be segmented, and based on each segmented part, the foreground image and the target background image Fusion is performed so that parts of the object can be hidden behind something in the target background image, resulting in a more valuable image.
  • a person is selected as the foreground image and a fence is the background image
  • Gaussian transparency can be added to the legs and feet of the person in the foreground image, and then the foreground image is fused with the fence, and the fusion
  • the resulting image shows the human leg and foot area hidden behind the fence.
  • Step 304 Determine the first image and the plurality of second images as sample images.
  • video streams captured by multiple cameras may also be processed to determine sample images.
  • the implementation process of processing video streams captured by multiple cameras is similar to the implementation process of processing video streams above, but it is worth noting that when generating the second image, the foreground image in the video stream captured by the same camera needs to be Merged with the background image.
  • the first image in the process of analyzing the target object in the video stream, the first image may also be determined. Since there are missing and false positive objects in the first image, the analysis result of the target object performed by the first neural network model on the first image is inaccurate. When the first neural network model is trained again through the first image, it is more targeted and more valuable, so that the training efficiency and training effect can be improved. In addition, in the embodiment of the present application, on the basis of generating the first image, more valuable foreground images and background images can also be obtained, and these foreground images and background images are fused to generate the second image, which increases the number of generated samples number of images.
  • the types of the generated second images can be richer and more accurate. It is valuable, and can further improve the training effect of the first neural network model, thereby effectively improving the analysis performance of the first neural network model.
  • Fig. 4 is a schematic structural diagram of a device for determining a sample image provided by an embodiment of the present application.
  • the device for determining a sample image can be implemented by software, hardware or a combination of the two to become part or all of a computer device.
  • the computer device can be The camera or server shown in Figure 1.
  • the apparatus includes: a first determination module 401 , an acquisition module 402 , a fusion module 403 and a second determination module 404 .
  • the first determination module 401 is configured to perform a false alarm analysis on the video stream to determine the first image and the false alarm result corresponding to the first image.
  • the first image is an image of an object that has a false alarm in the video stream.
  • the false positive object is the target object of missing analysis or the target object of misanalysis;
  • An acquisition module 402 configured to acquire a plurality of foreground images and a plurality of background images from the video stream based on the false positive result corresponding to the first image;
  • a fusion module 403 configured to fuse multiple foreground images and multiple background images to obtain multiple second images
  • the second determining module 404 is configured to determine the first image and multiple second images as sample images.
  • the first determining module 401 includes:
  • the first determination sub-module is used to analyze any frame of images in the video stream through a plurality of first neural network models and second neural network models, so as to determine whether there is a leakage error in any frame of images report object, and the second neural network model is a model capable of analyzing all target objects in any frame image;
  • the second determination sub-module is configured to, if there is a false alarm object in any frame image, take any frame image as the first image, and determine a false alarm result corresponding to the first image.
  • the first determining submodule includes:
  • the first determination unit is configured to respectively determine the first analysis results corresponding to the any frame of images through a plurality of first neural network models, so as to obtain a plurality of first analysis results, the first analysis results include at least location information of a target object and the first label of each target object;
  • the second determination unit is configured to determine a second analysis result corresponding to any frame of image through a second neural network model, the second analysis result includes position information of at least one target object in any frame of image and the position information of each target object second label;
  • a third determining unit configured to determine any frame of image based on the position information and the first label of the target object included in the first analysis results, and the position information and the second label of the target object included in the second analysis result Whether there are false positive objects in .
  • the third determination unit is specifically configured to:
  • the position information of the target object included in the multiple first analysis results is the same as the second label of the target object included in the second analysis result.
  • the intersection ratio between any two position information of the target object included in the analysis results is greater than the preset ratio threshold, then it is determined that there is no false positive object in any frame image, otherwise, it is determined that any False positive objects are present in the frame image.
  • the third determining unit is specifically configured to:
  • the false positive result includes position information of at least one false positive object in the first image, and the acquisition time of the first image;
  • Obtaining module 402 includes:
  • the third determining submodule is configured to determine at least one spatial range based on the position information of the at least one false positive object, and the at least one spatial range corresponds to the at least one false positive object;
  • a fourth determining submodule configured to determine a time range based on the acquisition time of the first image
  • the obtaining submodule is used to obtain multiple foreground images and multiple background images from the video stream based on the time range and the at least one spatial range.
  • the acquisition submodule is specifically used to:
  • clustering a plurality of image regions to obtain a plurality of first clustering results
  • clustering a plurality of third images to obtain a plurality of second clustering results
  • a plurality of image regions are selected from the plurality of first clustering results as a plurality of foreground images, and a plurality of third images are selected from the plurality of second clustering results as a plurality of background images.
  • the fusion module 403 is specifically used to:
  • Semantic segmentation is performed on multiple background images to determine semantic segmentation information corresponding to the multiple background images;
  • the plurality of foreground images and the plurality of background images are fused based on the acquisition times and image positions respectively corresponding to the plurality of foreground images, and the acquisition times and semantic segmentation information respectively corresponding to the plurality of background images.
  • the first image in the process of analyzing the target object in the video stream, the first image may also be determined. Because there are missing and false positive objects in the first image, the analysis result of the target object performed by the first neural network model on the first image is inaccurate. When the first neural network model is trained again through the first image, it is more targeted and more valuable, so that the training efficiency and training effect can be improved. In addition, in the embodiment of the present application, on the basis of generating the first image, more valuable foreground images and background images can also be obtained, and these foreground images and background images are fused to generate the second image, which increases the number of generated samples the number of images.
  • the device for determining a sample image determines the sample image, it only uses the division of the above-mentioned functional modules as an example. In practical applications, the above-mentioned functions can be assigned to different functions Module completion means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device for determining a sample image provided by the above embodiment and the embodiment of the method for determining a sample image belong to the same idea, and its specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 5 is a structural block diagram of a terminal 500 provided by an embodiment of the present application.
  • the terminal 500 may be used as a video camera.
  • the terminal 500 can be a portable mobile terminal, such as: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic Video Expert compresses the standard audio level 4) player, laptop or desktop computer.
  • the terminal 500 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 500 includes: a processor 501 and a memory 502 .
  • the processor 501 may include one or more processing cores, for example: a 4-core processor, an 8-core processor, and the like.
  • Processor 501 can adopt at least one hardware form in DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • Processor 501 may also include a main processor and a coprocessor, and the main processor is a processor for processing data in a wake-up state, also known as a CPU (Central Processing Unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • CPU Central Processing Unit, central processing unit
  • the coprocessor is Low-power processor for processing data in standby state.
  • the processor 501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content to be displayed on the display screen.
  • the processor 501 may also include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 502 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 502 may also include high-speed random access memory and non-volatile memory, for example: one or more magnetic disk storage devices, flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 502 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 501 to realize the determination samples provided by the method embodiments in this application. image method.
  • the terminal 500 may optionally further include: a peripheral device interface 503 and at least one peripheral device.
  • the processor 501, the memory 502, and the peripheral device interface 503 may be connected through buses or signal lines.
  • Each peripheral device can be connected to the peripheral device interface 503 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 504 , a touch display 505 (corresponding to the display 505 in FIG. 5 ), a camera assembly 506 , an audio circuit 507 , a positioning assembly 508 and a power supply 509 .
  • the peripheral device interface 503 may be used to connect at least one peripheral device related to I/O (Input/Output, input/output) to the processor 501 and the memory 502 .
  • the processor 501, memory 502 and peripheral device interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 501, memory 502 and peripheral device interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 504 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 504 communicates with the communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 504 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • the radio frequency circuit 504 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.
  • the radio frequency circuit 504 may also include circuits related to NFC (Near Field Communication, short-range wireless communication), which is not limited in this embodiment of the present application.
  • the display screen 505 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 505 also has the ability to collect touch signals on or above the surface of the display screen 505 .
  • the touch signal can be input to the processor 501 as a control signal for processing.
  • the display screen 505 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 505 there may be one display screen 505, which is set on the front panel of the terminal 500; in other embodiments, there may be at least two display screens 505, which are respectively arranged on different surfaces of the terminal 500 or in a folding design; In some other embodiments, the display screen 505 may be a flexible display screen, which is arranged on a curved surface or a folded surface of the terminal 500 . Even, the display screen 505 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 505 can be made of LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light-emitting diode) and other materials.
  • the camera assembly 506 is used to capture images or videos.
  • the camera component 506 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • there are at least two rear cameras which are any one of the main camera, depth-of-field camera, wide-angle camera, and telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function.
  • camera assembly 506 may also include a flash.
  • the flash can be a single-color temperature flash or a dual-color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • Audio circuitry 507 may include a microphone and speakers.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 501 for processing, or input them to the radio frequency circuit 504 to realize voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 501 or the radio frequency circuit 504 into sound waves.
  • the loudspeaker can be a conventional membrane loudspeaker or a piezoelectric ceramic loudspeaker.
  • audio circuitry 507 may also include a headphone jack.
  • the positioning component 508 is used to locate the current geographic location of the terminal 500, so as to realize navigation or LBS (Location Based Service, location-based service).
  • the positioning component 508 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China or the Galileo system of Russia.
  • the power supply 509 is used to supply power to various components in the terminal 500 .
  • Power source 509 may be AC, DC, disposable or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery charged through a wired line
  • a wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 500 further includes one or more sensors 510 .
  • the one or more sensors 510 include, but are not limited to: an acceleration sensor 511 , a gyro sensor 512 , a pressure sensor 513 , a fingerprint sensor 514 , an optical sensor 515 and a proximity sensor 516 .
  • the acceleration sensor 511 can detect the acceleration on the three coordinate axes of the coordinate system established by the terminal 500 .
  • the acceleration sensor 511 can be used to detect the components of the gravitational acceleration on the three coordinate axes.
  • the processor 501 may control the touch display screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511 .
  • the acceleration sensor 511 can also be used for collecting game or user's motion data.
  • the gyro sensor 512 can detect the body direction and rotation angle of the terminal 500 , and the gyro sensor 512 can cooperate with the acceleration sensor 511 to collect 3D actions of the user on the terminal 500 .
  • the processor 501 can realize the following functions: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control and inertial navigation.
  • the pressure sensor 513 may be disposed on a side frame of the terminal 500 and/or a lower layer of the touch screen 505 .
  • the processor 501 performs left and right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 513 .
  • the processor 501 controls the operable controls on the UI interface according to the user's pressure operation on the touch screen 505.
  • the operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 514 is used to collect the user's fingerprint, and the processor 501 recognizes the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 recognizes the user's identity according to the collected fingerprint. When the identity of the user is recognized as a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations, such sensitive operations include unlocking the screen, viewing encrypted information, downloading software, making payment, and changing settings.
  • the fingerprint sensor 514 may be provided on the front, back or side of the terminal 500 . When the terminal 500 is provided with a physical button or a manufacturer's logo, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer's Logo.
  • the optical sensor 515 is used to collect ambient light intensity.
  • the processor 501 can control the display brightness of the touch screen 505 according to the ambient light intensity collected by the optical sensor 515 . Specifically, when the ambient light intensity is high, the display brightness of the touch screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch screen 505 is decreased.
  • the processor 501 may also dynamically adjust shooting parameters of the camera assembly 506 according to the ambient light intensity collected by the optical sensor 515 .
  • the proximity sensor 516 also called a distance sensor, is usually arranged on the front panel of the terminal 500 .
  • the proximity sensor 516 is used to collect the distance between the user and the front of the terminal 500 .
  • the processor 501 controls the touch display 505 to switch from the bright screen state to the off screen state; when the proximity sensor 516 detects When the distance between the user and the front of the terminal 500 gradually increases, the processor 501 controls the touch display screen 505 to switch from the off-screen state to the on-screen state.
  • FIG. 5 does not constitute a limitation on the terminal 500, and may include more or less components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • FIG. 6 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 600 includes a central processing unit (CPU) 601, a system memory 604 including a random access memory (RAM) 602 and a read only memory (ROM) 603, and a system bus 605 connecting the system memory 604 and the central processing unit 601.
  • Server 600 also includes a basic input/output system (I/O system) 606 that facilitates the transfer of information between the various components within the computer, and a mass storage device 607 for storing operating system 613, application programs 614, and other program modules 615 .
  • I/O system basic input/output system
  • the basic input/output system 606 includes a display 608 for displaying information and input devices 609 such as a mouse and a keyboard for user input of information. Both the display 608 and the input device 609 are connected to the central processing unit 601 through the input and output controller 610 connected to the system bus 605 .
  • the basic input/output system 606 may also include an input output controller 610 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus. Similarly, input output controller 610 also provides output to a display screen, printer, or other type of output device.
  • Mass storage device 607 is connected to central processing unit 601 through a mass storage controller (not shown) connected to system bus 605 .
  • Mass storage device 607 and its associated computer-readable media provide non-volatile storage for server 600 . That is, mass storage device 607 may include computer-readable media (not shown), such as hard disks or CD-ROM drives.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the server 600 can also run on a remote computer connected to the network through a network such as the Internet. That is to say, the server 600 can be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or can use the network interface unit 611 to connect to other types of networks or remote computer systems (not shown).
  • the above-mentioned memory also includes one or more programs, one or more programs are stored in the memory and configured to be executed by the CPU.
  • a computer-readable storage medium is also provided, and a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the method for determining the sample image in the above-mentioned embodiments are implemented.
  • the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • the computer-readable storage medium mentioned in the embodiment of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.
  • a computer program product containing instructions, which, when run on a computer, causes the computer to execute the steps of the above-mentioned method for determining a sample image.

Abstract

一种确定样本图像的方法、装置、设备及存储介质,所述方法包括:对视频流进行漏误报分析,以确定第一图像和第一图像对应的漏误报结果,该第一图像为视频流中存在漏误报对象的图像,该漏误报对象为漏分析的目标对象或者误分析的目标对象。基于第一图像对应的漏误报结果,从视频流中获取多个前景图像和多个背景图像。对多个前景图像和多个背景图像进行融合,得到多个第二图像,将该第一图像和该多个第二图像确定为样本图像。本申请实施例通过这些样本图像来训练第一神经网络模型,能够提升第一神经网络模型的训练效果,进而有效提升第一神经网络模型的分析性能。

Description

确定样本图像的方法、装置、设备及存储介质
本申请要求于2021年10月22日提交中国专利局、申请号为202111235744.8发明名称为“确定样本图像的方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据处理领域,特别涉及确定样本图像的方法、装置、设备及存储介质。
背景技术
在神经网络模型对图像进行推理的过程中,需要对图像中的目标对象进行分析,同时还会不断地对模型自身进行更新。随着神经网络模型的更新,神经网络模型可能会遗忘之前学习过的某些类型的目标对象,当需要对这些类型的目标对象再次分析时,可能会出现分析准确率不高的问题。因此,需要针对容易遗忘的目标对象来重新确定样本图像,便于后期对神经网络模型再次训练,从而提高神经网络模型分析的准确率。
相关技术提出了一种确定样本图像的方法,在该方法中,可以获取大量的图像,通过神经网络模型,从这些图像中筛选出不确定度较高的图像作为样本图像,以及根据这些图像的特征分布特点来划分进而按照特征分布特点从中选择一些图像作为样本图像。然而,采用不确定性和特征分布特点确定的样本图像,其训练价值低,为了使神经网络模型的性能达到预设的效果,需要大量的样本图像。
如何获得更有价值的样本图像,通过数量更少的有价值的样本图像,使神经网络模型的性能快速提升,成为亟待解决的技术问题。
发明内容
本申请实施例提供了一种确定样本图像的方法、装置、设备及存储介质,可以解决相关技术中因样本图像的训练价值低,在提升神经网络模型的性能上比较有限的问题。所述技术方案如下:
一方面,提供了一种确定样本图像的方法,所述方法包括:
对视频流进行漏误报分析,以确定第一图像和所述第一图像对应的漏误报结果,所述第一图像为所述视频流中存在漏误报对象的图像,所述漏误报对象为漏分析的目标对象或者误分析的目标对象;
基于所述第一图像对应的漏误报结果,从所述视频流中获取多个前景图像和多个背景图像;
对所述多个前景图像和所述多个背景图像进行融合,以得到多个第二图像;
将所述第一图像和所述多个第二图像确定为样本图像。
另一方面,提供了一种确定样本图像的装置,所述装置包括:
第一确定模块,用于对视频流进行漏误报分析,以确定第一图像和所述第一图像对应的漏误报结果,所述第一图像为所述视频流中存在漏误报对象的图像,所述漏误报对象为漏分析的目标对象或者误分析的目标对象;
获取模块,用于基于所述第一图像对应的漏误报结果,从所述视频流中获取多个前景图像和多个背景图像;
融合模块,用于对所述多个前景图像和所述多个背景图像进行融合,以得到多个第二图像;
第二确定模块,用于将所述第一图像和所述多个第二图像确定为样本图像。
另一方面,提供了一种计算机设备,所述计算机设备为摄像机或者服务器,所述计算机设备包括存储器和处理器,所述存储器用于存放计算机程序,所述处理器用于执行所述存储器上所存放的计算机程序,以实现上述所述确定样本图像的方法的步骤。
另一方面,提供了一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述所述确定样本图像的方法的步骤。
另一方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述所述确定样本图像的方法的步骤。
本申请实施例提供的技术方案至少可以带来以下有益效果:
本申请实施例通过对视频流进行漏误报分析,以确定第一图像和第一图像对应的漏误报结果。通过得到的第一图像对应的漏误报结果,从视频流中获取多个前景图像和多个背景图像,再将多个前景图像和多个背景图像进行融合,得到多个第二图像,将第一图像和多个第二图像确定为样本图像。通过本申请实施例的确定样本图像的方法可以生成更多且更有价值的样本图像,利用这些样本图像训练第一神经网络模型,能够提升第一神经网络模型的训练效果,进而有效提升第一神经网络模型的分析性能。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种实施环境示意图;
图2为本申请实施例提供的一种样本图像确定设备的结构示意图;
图3是本申请实施例提供的一种确定样本图像的方法的流程图;
图4是本申请实施例提供的一种确定样本图像的装置的结构示意图;
图5是本申请实施例提供的一种摄像机的结构示意图;
图6是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在对本申请实施例提供的确定样本图像的方法进行详细的解释说明之前,先对本申请实施例提供的应用场景和实施环境进行介绍。
为了实现对目标对象的分析,通常需要通过样本图像对第一神经网络模型进行训练,再利用经过训练的第一神经网络模型对摄像机拍摄的视频流中的目标对象进行分析。但是,在利用第一神经网络模型对目标对象进行分析的过程中,会不断地对第一神经网络模型进行阶段性更新,随着第一神经网络模型的更新,以及该第一神经网络模型学习过的目标对象类型的增加,该第一神经网络模型可能会遗忘之前学习过的某些类型的目标对象,当需要再次对这些类型的目标对象进行分析时,可能会出现分析准确率不高的问题。通俗地讲,因第一神经网络模型不断的更新,可能会遗忘某些类型的目标对象的特征,从而无法分析出这些目标对象,或者对这些目标对象的分析准确率较低。
在一个例子中,第一神经网络模型用于识别出目标对象在一年四季的特征,当第一神经网络模型分别在春季、夏季、秋季及冬季四个季节对目标对象的特征进行分析之后,如果需要再次在春季对目标对象进行分析时,由于第一神经网络模型会随着目标对象在四个季节的特征变化而更新,可能会遗忘目标对象在春季的特征,当在春季对目标对象进行分析时,会造成分析准确率较低的问题。
在通过第一神经网络模型对目标对象进行分析的过程中,还可以基于摄像机拍摄的视频流,按照本申请实施例提供的方法来确定并存储样本图像,即,通过第一神经网络模型对目标对象进行在线分析的同时,还能够确定并存储样本图像。使得再次对目标对象进行分析时,可以先基于该样本图像对第一神经网络模型再次训练,从而提高第一神经网络模型的分析准确率。
请参考图1,图1是根据一示例性实施例示出的一种实施环境的示意图。该实施环境包括至少一个摄像机101(图1中以一个摄像机示意性表示该至少一个摄像机)和服务器102,摄像机101可以与服务器102进行通信连接,该通信连接方式可以为有线连接或者无线连接,本申请实施例对 此不做限定。
摄像机101用于拍摄视频流,并将拍摄到的视频流传输给服务器102。
服务器102中部署有第一神经网络模型,通过该第一神经网络模型对摄像机101传输的视频流中的目标对象进行分析。同时,服务器102还可以基于摄像机101传输的视频流确定并存储样本图像。
可以理解的是,上述为一种示例性的实施环境,在另一些实施环境中,也可以不包括服务器102。在此情况下,可以利用摄像机101拍摄视频流,并按照本申请实施例提供的方法,对视频流中的目标对象进行分析,进而确定并存储样本图像。
也就是说,本申请实施例可以通过摄像机101对视频流进行处理来确定样本图像,也可以通过服务器102对视频流进行处理来确定样本图像。为了便于描述,可以将摄像机或者服务器称为样本图像确定设备。
示例性地,请参考图2,该样本图像确定设备可以包括漏误报分析模块、时空挖掘模块、存储模块和样本生成模块。其中,漏误报分析模块用于在对目标对象进行分析的过程中,从视频流中确定存在漏误报对象的图像作为第一图像,同时确定第一图像对应的漏误报结果;时空挖掘模块用于基于第一图像对应的漏误报结果,从视频流中获取多个前景图像和多个背景图像;存储模块用于存储第一图像,以及得到的多个前景图像和多个背景图像;样本生成模块用于将存储的多个前景图像和多个背景图像进行融合,得到多个第二图像,再将第一图像和多个第二图像确定为样本图像。
需要说明的是,存储模块也可以不存储多个前景图像和多个背景图像,而是在挖掘得到多个前景图像和多个背景图像之后,先通过样本生成模块将多个前景图像和多个背景图像进行融合,得到多个第二图像,再由存储模块存储该多个第二图像。
当然,存储模块也可以存储多个前景图像和多个背景图像,同时存储多个第二图像。在此情况下,还可以基于该多个前景图像和多个背景图像额外生成其他的图像,用于第一神经网络模型的训练。
其中,摄像机101可以是任何一种具有摄像功能的设备,例如,智能手机、数码相机、云台监控设备等等。服务器102可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。
本领域技术人员应能理解上述摄像机101和服务器102仅为举例,其他现有的或今后可能出现的摄像机或服务器如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。
接下来对本申请实施例提供的确定样本图像的方法进行详细的解释说明。
图3是本申请实施例提供的一种确定样本图像的方法的流程图,该方法应用于服务器。请参考图3,该方法包括如下步骤:
步骤301:对视频流进行漏误报分析,以确定第一图像和第一图像对应的漏误报结果,第一图像为视频流中存在漏误报对象的图像,该漏误报对象为漏分析的目标对象或者误分析的目标对象。
在一些实施例中,可以通过多个第一神经网络模型以及第二神经网络模型,对视频流中的任一帧图像进行漏误报分析,以确定该任一帧图像中是否存在漏误报对象,第二神经网络模型为能够对该任一帧图像中的所有目标对象进行分析的模型。如果该任一帧图像中存在漏误报对象,则将该任一帧图像作为第一图像,并确定第一图像对应的漏误报结果。
需要说明的是,上述视频流可以是至少一个摄像机中任意一个摄像机拍摄得到的视频流。
由于视频流中可以包含多帧图像,对于任一帧图像,都可以按照上述方法确定该图像中是否存在漏误报对象,进而确定是否将该图像作为第一图像。当对视频流中的每帧图像都按照上述方法处理之后,可以得到多个第一图像以及各个第一图像对应的漏误报结果。本申请实施例是以一个第一图像为例进行说明。
其中,漏误报结果可以包括第一图像中至少一个漏误报对象的位置信息,以及第一图像的采集时间,当然,该漏误报结果还可以包括其他信息,本申请实施例对此不做限定。另外,基于上文描述,漏误报对象可以为漏分析的目标对象,也可以为误分析的目标对象,所以对于该至少一个漏误报对象来说,该至少一个漏误报对象可以包括漏分析的目标对象,也可以包括误分析的目标对象。而且在第一图像包括多个漏误报对象的情况下,该多个漏误报对象可以既包括漏分析的目标对象,又包括误分析的目标对象。
需要说明的是,该多个第一神经网络模型均能够对目标对象进行分析,且该多个第一神经网络模型可以为结构不同的网络模型,也可以为结构相同的网络模型,且每个第一神经网络模型都是经过数据集训练过的模型。在该多个第一神经网络模型的结构相同的情况下,该多个第一神经网络模型可以通过不同的数据集进行训练,在该多个第一神经网络模型的结构不同的情况下,该多个第一神经网络模型可以通过相同的数据集进行训练,也可以通过不同的数据集进行训练。第二神经网络模型是经过多种公开数据集训练而成的,该第二神经网络模型为能够对任一帧图像中的所有目标对象进行分析的模型,并且该第二神经网络模型基本能够识别所有的目标对象,例如,第二神经网络模型为开集识别模型。
其中,通过多个第一神经网络模型以及第二神经网络模型,对视频流中的任一帧图像进行漏误 报分析,以确定该任一帧图像中是否存在漏误报对象的实现过程包括:通过多个第一神经网络模型分别确定该任一帧图像对应的第一分析结果,以得到多个第一分析结果,该第一分析结果包括该任一帧图像中至少一个目标对象的位置信息和每个目标对象的第一标签。通过第二神经网络模型确定该任一帧图像对应的第二分析结果,该第二分析结果包括该任一帧图像中至少一个目标对象的位置信息和每个目标对象的第二标签。基于该多个第一分析结果所包括的目标对象的位置信息和第一标签,以及所述第二分析结果所包括的目标对象的位置信息和第二标签,确定该任一帧图像中是否存在漏误报对象。
由于该多个第一神经网络模型中的每个第一神经网络模型确定该任一帧图像对应的第一分析结果的方式相同,因此,接下来以其中一个第一神经网络模型为例进行说明。即,将该任一帧图像作为第一神经网络模型的输入,得到该第一神经网络模型输出的至少一个目标对象的位置信息,以及每个目标对象属于多个标签的概率。对于该至少一个目标对象中的每个目标对象,从该目标对象属于多个标签的概率中确定出最大概率,将该最大概率对应的标签作为该目标对象对应的第一标签。此时,可以得到该任一帧图像中该至少一个目标对象的位置信息和每个目标对象对应的第一标签,即该任一帧图像对应的第一分析结果。当通过该多个第一神经网络模型分别对该任一帧图像进行处理之后,即可得到多个第一分析结果。
同样地,通过第二神经网络模型确定该任一帧图像对应的第二分析结果的实现过程包括:将该任一帧图像作为第二神经网络模型的输入,得到该第二神经网络模型输出的至少一个目标对象的位置信息,以及每个目标对象属于多个标签的概率。对于该至少一个目标对象中的每个目标对象,从该目标对象属于多个标签的概率中确定出最大概率,将该最大概率对应的标签作为该目标对象对应的第二标签。此时,可以得到该任一帧图像中该至少一个目标对象的位置信息和每个目标对象对应的第二标签,即该任一帧图像对应的第二分析结果。
其中,基于该多个第一分析结果所包括的目标对象的位置信息和第一标签,以及第二分析结果所包括的目标对象的位置信息和第二标签,确定该任一帧图像中是否存在漏误报对象的实现过程可以为:如果该多个第一分析结果所包括的目标对象的第一标签与该第二分析结果所包括的目标对象的第二标签均相同,且多个第一分析结果所包括的目标对象的位置信息与第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,则确定该任一帧图像中不存在漏误报对象,否则,确定该任一帧图像中存在漏误报对象。其中,多个第一分析结果所包括的目标对象的位置信息与第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,是指,第一分析结果所包括的目标对象的位置信息与第二分析结 果所包括的目标对象的位置信息存在一定比例的交叠。预设比例阈值可根据实际需要自行设定,例如设置为90%、85%、80%或70%等。
在一种可能的实施方式中,如果该多个第一分析结果所包括的目标对象的位置信息和第一标签均相同,且与第二分析结果所包括的目标对象的位置信息和第二标签相同,则确定该任一帧图像中不存在漏误报对象,否则,确定该任一帧图像中存在漏误报对象。
其中,在通过多个第一神经网络模型确定出该任一帧图像对应的多个第一分析结果之后,对于该多个第一分析结果中任意两个第一分析结果来说,这两个第一分析结果所包括的至少一个目标对象可能相同,也可能不同,而且同一目标对象对应的第一标签可能相同,也可能不同。同理,通过第二神经网络模型确定出该任一帧图像对应的第二分析结果之后,该第二分析结果所包括的至少一个目标对象与上述任一第一分析结果包括的至少一个目标对象可能相同,也可能不同,而且同一目标对象对应的第一标签可能相同,也可能不同。
基于上述描述,第二神经网络模型能够识别出该任一图像中的所有目标对象,通过将该多个第一分析结果和该第二分析结果进行对比,可以确定该任一帧图像中是否包括漏误报对象。在一个例子中,针对该多个第一分析结果包括的所有目标对象中的任一目标对象,通过确定该多个第一分析结果中是否均包括该目标对象的位置信息和第一标签,确定出该目标对象是否为漏分析的目标对象,如果该多个第一分析结果中均包括该目标对象的位置信息和第一标签,则确定该目标对象不是漏分析的目标对象,否则,确定该目标对象为漏分析的目标对象。
在该目标对象不是漏分析的目标对象的情况下,还可以确定该多个第一分析结果包括的该目标对象的位置信息和第一标签是否均相同,以及该目标对象的位置信息和第一标签与第二分析结果中该目标对象的位置信息和第二标签是否相同。如果该多个第一分析结果包括的该目标对象的位置信息和第一标签均相同,且该目标对象的位置信息和第一标签分别与第二分析结果中该目标对象的位置信息和第二标签相同,则确定该目标对象不是误分析的目标对象,否则,确定该目标对象为误分析的目标对象。
在一种可能的实施方式中,视频流中较多图像可能都存在漏误报对象,在此情况下,可以基于这些图像中的漏误报对象,确定这些图像中的各个图像的得分,再按照得分来选择部分图像作为第一图像。
示例地,可以基于漏误报对象的数量和重要程度来确定各个图像的得分,再选择得分高于分数阈值的图像作为第一图像。可以理解的是,还可以按照其他的方式来确定各个图像的得分,也可以按照其他的方式来选择第一图像。
步骤302:基于第一图像对应的漏误报结果,从视频流中获取多个前景图像和多个背景图像。
在一些实施例中,漏误报结果包括第一图像中的至少一个漏误报对象的位置信息,以及第一图像的采集时间。在此情况下,可以基于该至少一个漏误报对象的位置信息确定出至少一个空间范围,该至少一个空间范围与该至少一个漏误报对象一一对应。基于第一图像的采集时间确定出一个时间范围,基于该时间范围和该至少一个空间范围,从视频流中获取多个前景图像和多个背景图像。
作为一种示例,可以确定第一时间和第二时间,其中,第一时间位于该第一图像的采集时间之前且与该第一图像的采集时间之间间隔第一时长,第二时间位于该第一图像的采集时间之后且与该第一图像的采集时间之间间隔第二时长。将第一时间与第二时间之间的时间范围确定为该第一图像对应的时间范围。
其中,第一时长和第二时长可以事先设置,而且可以按照不同的需求来调整。第一时长和第二时长可以相等,也可以不相等。
对于第一图像中的任一漏误报对象,基于该漏误报对象的位置信息,可以确定出该漏误报对象的几何中心,再从该第一图像中,确定出以该漏误报对象的几何中心为圆心且包括该漏误报对象的一个圆形区域,将该圆形区域作为一个空间范围。还可以从第一图像中确定出包括该至少一个漏误报对象的一个区域,将该区域作为该第一图像对应的空间范围。
作为一种示例,可以基于该第一图像中的漏误报的位置,确定这些漏误报对象的几何中心。再从该第一图像中,确定出以该几何中心为圆心且包括这些漏误报对象的一个圆形区域,将该圆形区域作为该第一图像对应的空间范围。
基于该时间范围和该至少一个空间范围,从视频流中获取多个前景图像和多个背景图像的实现过程包括:从视频流中获取采集时间位于该时间范围内、位置位于该至少一个空间范围内且为前景的图像区域,以得到多个图像区域。从视频流中获取采集时间位于该时间范围内且不存在前景的图像,得到多个第三图像。对该多个图像区域进行聚类,得到多个第一聚类结果,以及对该多个第三图像进行聚类,得到多个第二聚类结果。从该多个第一聚类结果中选择多个图像区域作为多个前景图像,以及从该多个第二聚类结果中选择多个第三图像作为多个背景图像。
也就是说,获取到采集时间位于该时间范围内、位置位于该至少一个空间范围中任一空间范围内且为前景的图像区域,再将这些图像区域进行聚类确定出多个前景图像;以及获取到采集时间位于该时间范围内、位置位于该至少一个空间范围中任一空间范围内且不存在前景的图像,以得到多个第三图像,再将这些第三图像进行聚类确定出多个背景图像。
需要说明的是,聚类是指按照图像特征的相似性进行分类的过程,聚类之后,同一聚类结果中 图像特征的相似性尽可能大,不同聚类结果中图像特征的差异性也尽可能地大。因此,在对多个图像区域进行聚类之后,同一个第一聚类结果包括图像特征相似的图像区域,不同的第一聚类结果包括图像特征不相似的图像区域。同理,将多个第三图像进行聚类之后,同一个第二聚类结果中包括图像特征相似的第三图像,不同的第二聚类结果中包括图像特征不相似的第三图像。
由于第一神经网络模型会对视频流中的任一帧图像进行处理,因此,第一神经网络模型在进行目标对象分析时,会提取每个图像的图像特征。在进行聚类时,该图像特征可以为第一神经网络模型输出的特征,也可以是重新提取的特征,本申请实施例对此不做限定。
将上述多个图像区域进行聚类之后,同一个第一聚类结果包括图像特征相似的图像区域,不同的第一聚类结果包括图像特征不相似的图像区域,为了保证样本图像的丰富性和多样性,可以从每个第一聚类结果中选择一个或多个图像区域作为多个前景图像。同样地,将上述多个第三图像进行聚类之后,同一个第二聚类结果中包括图像特征相似的第三图像,不同的第二聚类结果中包括图像特征不相似的第三图像,为了保证样本图像的丰富性和多样性,可以从每个第二聚类结果中选择一个或多个第三图像作为多个背景图像。
在第一图像中存在漏误报对象的情况下,表明多个第一神经网络模型对第一图像中的目标对象的特征有所遗忘,而在一个视频流中,可能连续多帧图像的内容比较相似,此时,其他图像中可能也存在被多个第一神经网络模型所遗忘的特征,因此,可以基于第一图像的采集时间,按照上述方法确定一个时间范围,以及基于至少一个漏误报对象的位置确定出至少一个空间范围。在此情况下,通过该时间范围和该至少一个空间范围能够确定出多个可能被遗忘特征的目标对象所在的前景图像。同时,为了便于后续的图像融合,还可以基于该时间范围和该至少一个空间范围确定出多个背景图像,便于后续在进行图像融合之后,能够获取到更多的样本图像,充分保证了样本图像的丰富性,使确定的样本图像更具有价值。
步骤303:对该多个前景图像和该多个背景图像进行融合,以得到多个第二图像。
在一些实施例中,可以确定出该多个前景图像在视频流中所处的一帧图像的采集时间,以及该多个前景图像在所处的一帧图像中的位置,以得到多个前景图像分别对应的采集时间和图像位置。对该多个背景图像分别进行语义分割,确定该多个背景图像分别对应的语义分割信息。确定多个背景图像在视频流中所处的一帧图像的采集时间,以得到多个背景图像分别对应的采集时间。基于该多个前景图像分别对应的采集时间和图像位置,以及该多个背景图像分别对应的采集时间和语义分割信息,对该多个前景图像和该多个背景图像进行融合。
由于一个前景图像为视频流中一帧图像中的部分图像区域,因此,在确定出前景图像之后,可 以确定该前景图像在视频流中所处的一帧图像的采集时间,以及该前景图像在所处的一帧图像中的位置,进而将该采集时间确定为该前景图像对应的采集时间,将该前景图像在所处的一帧图像中的位置确定为该前景图像对应的图像位置。同理,一个背景图像为视频流中的一帧图像,因此,在确定出背景图像之后,可以确定该背景图像在视频流中所处的一帧图像的采集时间,进而将该采集时间确定为该背景图像对应的采集时间。
需要说明的是,针对该多个背景图像中的任一背景图像,该背景图像包括多个像素点,利用语义分割算法可以将该背景图像中的每个像素点按照表达语义的不同进行分割,得到该背景图像的语义分割信息,该语义分割信息包括背景图像中不同区域的语义信息,例如,天空、草地等语义信息。
基于该多个前景图像分别对应的采集时间和图像位置,以及该多个背景图像的分别对应采集时间和语义分割信息,对该多个前景图像和该多个背景图像进行融合的实现过程包括:对于该多个前景图像中的任一前景图像,从该多个背景图像中选择采集时间与该前景图像的采集时间位于同一时间周期内的至少一个背景图像。基于该前景图像对应的图像位置和该至少一个背景图像的语义分割信息,从该至少一个背景图像中确定目标背景图像,将该前景图像与该目标背景图像进行融合,以使该前景图像在目标背景图像中的位置为该前景图像对应的图像位置。
其中,该同一时间周期是指同一季节的同一时段,例如,春季的白天,春季的晚上,冬季的白天,冬季的晚上等等。
由于背景图像中可能包括天空、草地、河流等语义分割区域,而该前景图像可能不适合位于某些语义分割区域内,例如,人不适合位于天空。因此,可以确定该前景图像包括的对象可允许出现的语义分割区域的语义分割信息,以得到至少一个目标语义分割信息。再从该至少一个背景图像中,选择位于该前景图像对应的图像位置处的语义分割区域的语义分割信息为任一目标语义分割信息的背景图像,将选择的背景图像作为目标背景图像。
其中,可以事先存储不同类别的对象可允许出现的语义分割区域的语义分割信息,在此情况下,可以确定该前景图像包括的对象的类别,进而能够确定得到该对象可允许出现的语义分割区域的语义分割信息。
在一种可能的实施方式中,在将该前景图像与目标背景图像进行融合时,还可以对该前景图像中对象的部位进行分割,基于分割后的各个部位,将该前景图像与目标背景图像进行融合,以使该对象的某些部位可以隐藏在目标背景图像中的某个物体之后,从而生成更有价值的图像。例如,选取人为前景图像,栅栏为背景图像,在对人和栅栏进行融合时,可以将前景图像中人的腿部和脚部区域都增加高斯透明度,再将该前景图像与栅栏进行融合,融合后的图像即表现为人的腿部和脚部 区域隐藏在栅栏后。
步骤304:将该第一图像和该多个第二图像确定为样本图像。
需要说明的是,上述是以一个摄像机拍摄的视频流为例,对确定样本图像的实现过程进行介绍。在实际应用中,还可以对多个摄像机拍摄的视频流进行处理来确定样本图像。对多个摄像机拍摄的视频流进行处理的实现过程与上述对视频流进行处理的实现过程类似,但值得注意的是,在生成第二图像时,需要将同一摄像机拍摄的视频流中的前景图像和背景图像进行融合。
本申请实施例中,在对视频流中的目标对象进行分析的过程中,还可以确定第一图像。由于第一图像中存在漏误报对象,因此,第一神经网络模型对第一图像进行目标对象的分析结果不准确。通过第一图像再次对第一神经网络模型进行训练时更具针对性,且更有价值,从而能够提高训练效率和训练效果。另外,本申请实施例在生成第一图像的基础上,还可以获取到更多有价值的前景图像和背景图像,将这些前景图像和背景图像进行融合来生成第二图像,增加了生成的样本图像的数量。而且,通过对多个图像区域进行聚类来选择不同类型的前景图像,以及对多个第三图像进行聚类来选择不同类型的背景图像,可以使生成的第二图像的类型更加丰富且更有价值,更进一步能够提升第一神经网络模型的训练效果,进而有效提升第一神经网络模型的分析性能。
图4是本申请实施例提供的一种确定样本图像的装置的结构示意图,该确定样本图像的装置可以由软件、硬件或者两者的结合实现成为计算机设备的部分或者全部,该计算机设备可以为图1所示的摄像机或者服务器。请参考图4,该装置包括:第一确定模块401、获取模块402、融合模块403和第二确定模块404。
第一确定模块401,用于对视频流进行漏误报分析,以确定第一图像和第一图像对应的漏误报结果,第一图像为视频流中存在漏误报对象的图像,该漏误报对象为漏分析的目标对象或者误分析的目标对象;
获取模块402,用于基于第一图像对应的漏误报结果,从视频流中获取多个前景图像和多个背景图像;
融合模块403,用于对多个前景图像和多个背景图像进行融合,以得到多个第二图像;
第二确定模块404,用于将第一图像和多个第二图像确定为样本图像。
在一种可能的实施方式中,第一确定模块401包括:
第一确定子模块,用于通过多个第一神经网络模型以及第二神经网络模型,对视频流中的任一帧图像进行漏误报分析,以确定该任一帧图像中是否存在漏误报对象,第二神经网络模型为能够对该任一帧图像中的所有目标对象进行分析的模型;
第二确定子模块,用于如果该任一帧图像中存在漏误报对象,则将该任一帧图像作为第一图像,并确定第一图像对应的漏误报结果。
在一种可能的实施方式中,第一确定子模块包括:
第一确定单元,用于通过多个第一神经网络模型分别确定该任一帧图像对应的第一分析结果,以得到多个第一分析结果,第一分析结果包括该任一帧图像中至少一个目标对象的位置信息和每个目标对象的第一标签;
第二确定单元,用于通过第二神经网络模型确定该任一帧图像对应的第二分析结果,第二分析结果包括该任一帧图像中至少一个目标对象的位置信息和每个目标对象的第二标签;
第三确定单元,用于基于多个第一分析结果所包括的目标对象的位置信息和第一标签,以及第二分析结果所包括的目标对象的位置信息和第二标签,确定任一帧图像中是否存在漏误报对象。
在一种可能的实施方式中,第三确定单元具体用于:
如果多个第一分析结果所包括的目标对象的第一标签与第二分析结果所包括的目标对象的第二标签均相同,且多个第一分析结果所包括的目标对象的位置信息与第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,则确定任一帧图像中不存在漏误报对象,否则,确定任一帧图像中存在漏误报对象。
在一种可能的实施方式中,第三确定单元具体用于:
如果多个第一分析结果所包括的目标对象的位置信息和第一标签均相同,且与第二分析结果所包括的目标对象的位置信息和第二标签相同,则确定该任一帧图像中不存在漏误报对象,否则,确定该任一帧图像中存在漏误报对象。
在一种可能的实施方式中,漏误报结果包括第一图像中的至少一个漏误报对象的位置信息,以及第一图像的采集时间;
获取模块402包括:
第三确定子模块,用于基于该至少一个漏误报对象的位置信息确定出至少一个空间范围,该至少一个空间范围与该至少一个漏误报对象一一对应;
第四确定子模块,用于基于第一图像的采集时间确定出一个时间范围;
获取子模块,用于基于时间范围和该至少一个空间范围,从视频流中获取多个前景图像和多个背景图像。
在一种可能的实施方式中,获取子模块具体用于:
从视频流中获取采集时间位于时间范围内、位置位于该至少一个空间范围内且为前景的图像区 域,以得到多个图像区域;
从视频流中获取采集时间位于时间范围内且不存在前景的图像,以得到多个第三图像;
对多个图像区域进行聚类,以得到多个第一聚类结果,以及对多个第三图像进行聚类,以得到多个第二聚类结果;
从多个第一聚类结果中选择多个图像区域作为多个前景图像,以及从多个第二聚类结果中选择多个第三图像作为多个背景图像。
在一种可能的实施方式中,融合模块403具体用于:
确定多个前景图像在视频流中所处的一帧图像的采集时间,以及多个前景图像在所处的一帧图像中的位置,以得到多个前景图像分别对应的采集时间和图像位置;
对多个背景图像分别进行语义分割,以确定多个背景图像分别对应的语义分割信息;
确定多个背景图像在视频流中所处的一帧图像的采集时间,以得到多个背景图像分别对应的采集时间;
基于多个前景图像分别对应的采集时间和图像位置,以及多个背景图像分别对应的采集时间和语义分割信息,对多个前景图像和多个背景图像进行融合。
本申请实施例中,在对视频流中的目标对象进行分析的过程中,还可以确定第一图像。由于第一图像中存在漏误报对象,因此,第一神经网络模型对第一图像进行目标对象的分析结果不准确。通过第一图像再次对第一神经网络模型进行训练时更具针对性,且更有价值,从而能够提高训练效率和训练效果。另外,本申请实施例在生成第一图像的基础上,还可以获取到更多有价值的前景图像和背景图像,将这些前景图像和背景图像进行融合来生成第二图像,增加了生成的样本图像的数量。而且,通过对多个图像区域进行聚类来选择不同类型的前景图像,以及对多个第三图像进行聚类来选择不同类型的背景图像,这样可以使生成的第二图像的类型更加丰富,更进一步能够提升第一神经网络模型的训练效果,进而有效提升第一神经网络模型的分析性能。
需要说明的是:上述实施例提供的确定样本图像的装置在确定样本图像时,仅以上述各功能模块的划分进行举例说明,在实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的确定样本图像的装置与确定样本图像的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图5是本申请实施例提供的一种终端500的结构框图。在本申请实施例中,该终端500可以作为摄像机来使用。该终端500可以是便携式移动终端,例如:智能手机、平板电脑、MP3播放器 (Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端500还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端500包括有:处理器501和存储器502。
处理器501可以包括一个或多个处理核心,例如:4核心处理器、8核心处理器等。处理器501可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器501可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器502还可包括高速随机存取存储器,以及非易失性存储器,例如:一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器502中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器501所执行以实现本申请中方法实施例提供的确定样本图像的方法。
在一些实施例中,终端500还可选包括有:外围设备接口503和至少一个外围设备。处理器501、存储器502和外围设备接口503之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口503相连。具体地,外围设备包括:射频电路504、触摸显示屏505(对应图5中的显示屏505)、摄像头组件506、音频电路507、定位组件508和电源509中的至少一种。
外围设备接口503可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器501和存储器502。在一些实施例中,处理器501、存储器502和外围设备接口503被集成在同一芯片或电路板上;在一些其他实施例中,处理器501、存储器502和外围设备接口503中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路504用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路504通过电磁信号与通信网络以及其他通信设备进行通信。射频电路504将电信号转换为电磁信号 进行发送,或者,将接收到的电磁信号转换为电信号。在一种可能的实施方式中,射频电路504包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路504可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路504还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请实施例对此不加以限定。
显示屏505用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏505是触摸显示屏时,显示屏505还具有采集在显示屏505的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器501进行处理。此时,显示屏505还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏505可以为一个,设置终端500的前面板;在另一些实施例中,显示屏505可以为至少两个,分别设置在终端500的不同表面或呈折叠设计;在再一些实施例中,显示屏505可以是柔性显示屏,设置在终端500的弯曲表面上或折叠面上。甚至,显示屏505还可以设置成非矩形的不规则图形,也即异形屏。显示屏505可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件506用于采集图像或视频。在一种可能的实施方式中,摄像头组件506包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件506还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路507可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器501进行处理,或者输入至射频电路504以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端500的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器501或射频电路504的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在 一些实施例中,音频电路507还可以包括耳机插孔。
定位组件508用于定位终端500的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件508可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源509用于为终端500中的各个组件进行供电。电源509可以是交流电、直流电、一次性电池或可充电电池。当电源509包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端500还包括有一个或多个传感器510。该一个或多个传感器510包括但不限于:加速度传感器511、陀螺仪传感器512、压力传感器513、指纹传感器514、光学传感器515以及接近传感器516。
加速度传感器511可以检测以终端500建立的坐标系的三个坐标轴上的加速度大小。例如,加速度传感器511可以用于检测重力加速度在三个坐标轴上的分量。处理器501可以根据加速度传感器511采集的重力加速度信号,控制触摸显示屏505以横向视图或纵向视图进行用户界面的显示。加速度传感器511还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器512可以检测终端500的机体方向及转动角度,陀螺仪传感器512可以与加速度传感器511协同采集用户对终端500的3D动作。处理器501根据陀螺仪传感器512采集的数据,可以实现如下功能:动作感应(例如,根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器513可以设置在终端500的侧边框和/或触摸显示屏505的下层。当压力传感器513设置在终端500的侧边框时,可以检测用户对终端500的握持信号,由处理器501根据压力传感器513采集的握持信号进行左右手识别或快捷操作。当压力传感器513设置在触摸显示屏505的下层时,由处理器501根据用户对触摸显示屏505的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器514用于采集用户的指纹,由处理器501根据指纹传感器514采集到的指纹识别用户的身份,或者,由指纹传感器514根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器501授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器514可以被设置终端500的正面、背面或侧面。当终端500上设置有物理按键或厂商Logo时,指纹传感器514可以与物理按键或厂商Logo集成在 一起。
光学传感器515用于采集环境光强度。在一个实施例中,处理器501可以根据光学传感器515采集的环境光强度,控制触摸显示屏505的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏505的显示亮度;当环境光强度较低时,调低触摸显示屏505的显示亮度。在另一个实施例中,处理器501还可以根据光学传感器515采集的环境光强度,动态调整摄像头组件506的拍摄参数。
接近传感器516,也称距离传感器,通常设置在终端500的前面板。接近传感器516用于采集用户与终端500的正面之间的距离。在一个实施例中,当接近传感器516检测到用户与终端500的正面之间的距离逐渐变小时,由处理器501控制触摸显示屏505从亮屏状态切换为息屏状态;当接近传感器516检测到用户与终端500的正面之间的距离逐渐变大时,由处理器501控制触摸显示屏505从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图5中示出的结构并不构成对终端500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图6是本申请实施例提供的一种服务器的结构示意图。服务器600包括中央处理单元(CPU)601、包括随机存取存储器(RAM)602和只读存储器(ROM)603的系统存储器604,以及连接系统存储器604和中央处理单元601的系统总线605。服务器600还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)606,和用于存储操作系统613、应用程序614和其他程序模块615的大容量存储设备607。
基本输入/输出系统606包括有用于显示信息的显示器608和用于用户输入信息的诸如鼠标、键盘之类的输入设备609。其中显示器608和输入设备609都通过连接到系统总线605的输入输出控制器610连接到中央处理单元601。基本输入/输出系统606还可以包括输入输出控制器610以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器610还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备607通过连接到系统总线605的大容量存储控制器(未示出)连接到中央处理单元601。大容量存储设备607及其相关联的计算机可读介质为服务器600提供非易失性存储。也就是说,大容量存储设备607可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、 闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器604和大容量存储设备607可以统称为存储器。
根据本申请的各种实施例,服务器600还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器600可以通过连接在系统总线605上的网络接口单元611连接到网络612,或者说,也可以使用网络接口单元611来连接到其他类型的网络或远程计算机系统(未示出)。
上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。
在一些实施例中,还提供了一种计算机可读存储介质,该存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例中确定样本图像的方法的步骤。例如,所述计算机可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。
值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。
应当理解的是,实现上述实施例的全部或部分步骤可以通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。所述计算机指令可以存储在上述计算机可读存储介质中。
也即是,在一些实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述所述的确定样本图像的方法的步骤。
应当理解的是,本文提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (18)

  1. 一种确定样本图像的方法,所述方法包括:
    对视频流进行漏误报分析,以确定第一图像和所述第一图像对应的漏误报结果,所述第一图像为所述视频流中存在漏误报对象的图像,所述漏误报对象为漏分析的目标对象或者误分析的目标对象;
    基于所述第一图像对应的漏误报结果,从所述视频流中获取多个前景图像和多个背景图像;
    对所述多个前景图像和所述多个背景图像进行融合,以得到多个第二图像;
    将所述第一图像和所述多个第二图像确定为样本图像。
  2. 如权利要求1所述的方法,其中,所述对视频流进行漏误报分析,以确定第一图像和所述第一图像对应的漏误报结果,包括:
    通过多个第一神经网络模型以及第二神经网络模型,对所述视频流中的任一帧图像进行漏误报分析,以确定所述任一帧图像中是否存在漏误报对象,所述第二神经网络模型为能够对所述任一帧图像中的所有目标对象进行分析的模型;
    如果所述任一帧图像中存在漏误报对象,则将所述任一帧图像作为所述第一图像,并确定所述第一图像对应的漏误报结果。
  3. 如权利要求2所述的方法,其中,所述通过多个第一神经网络模型以及第二神经网络模型,对所述视频流中的任一帧图像进行漏误报分析,以确定所述任一帧图像中是否存在漏误报对象,包括:
    通过所述多个第一神经网络模型分别确定所述任一帧图像对应的第一分析结果,以得到多个第一分析结果,所述第一分析结果包括所述任一帧图像中至少一个目标对象的位置信息和每个目标对象的第一标签;
    通过所述第二神经网络模型确定所述任一帧图像对应的第二分析结果,所述第二分析结果包括所述任一帧图像中至少一个目标对象的位置信息和每个目标对象的第二标签;
    基于所述多个第一分析结果所包括的目标对象的位置信息和第一标签,以及所述第二分析结果所包括的目标对象的位置信息和第二标签,确定所述任一帧图像中是否存在漏误报对象。
  4. 如权利要求3所述的方法,其中,所述基于所述多个第一分析结果所包括的目标对象的位置信息和第一标签,以及所述第二分析结果所包括的目标对象的位置信息和第二标签,确定所述任一帧图像中是否存在漏误报对象,包括:
    如果所述多个第一分析结果所包括的目标对象的第一标签与所述第二分析结果所包括的目标 对象的第二标签均相同,且所述多个第一分析结果所包括的目标对象的位置信息与所述第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,则确定所述任一帧图像中不存在漏误报对象,否则,确定所述任一帧图像中存在漏误报对象。
  5. 如权利要求4所述的方法,其中,所述如果所述多个第一分析结果所包括的目标对象的第一标签与所述第二分析结果所包括的目标对象的第二标签均相同,且所述多个第一分析结果所包括的目标对象的位置信息与所述第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,则确定所述任一帧图像中不存在漏误报对象,否则,确定所述任一帧图像中存在漏误报对象,包括:
    如果所述多个第一分析结果所包括的目标对象的位置信息和第一标签均相同,且与所述第二分析结果所包括的目标对象的位置信息和第二标签相同,则确定所述任一帧图像中不存在漏误报对象,否则,确定所述任一帧图像中存在漏误报对象。
  6. 如权利要求1所述的方法,其中,所述漏误报结果包括所述第一图像中的至少一个漏误报对象的位置信息,以及所述第一图像的采集时间;
    所述基于所述第一图像对应的漏误报结果,从所述视频流中获取多个前景图像和多个背景图像,包括:
    基于所述至少一个漏误报对象的位置信息确定出至少一个空间范围,所述至少一个空间范围与所述至少一个漏误报对象一一对应;
    基于所述第一图像的采集时间确定出一个时间范围;
    基于所述时间范围和所述至少一个空间范围,从所述视频流中获取所述多个前景图像和所述多个背景图像。
  7. 如权利要求6所述的方法,其中,所述基于所述时间范围和所述至少一个空间范围,从所述视频流中获取所述多个前景图像和所述多个背景图像,包括:
    从所述视频流中获取采集时间位于所述时间范围内、位置位于所述至少一个空间范围内且为前景的图像区域,以得到多个图像区域;
    从所述视频流中获取采集时间位于所述时间范围内且不存在前景的图像,以得到多个第三图像;
    对所述多个图像区域进行聚类,以得到多个第一聚类结果,以及对所述多个第三图像进行聚类,以得到多个第二聚类结果;
    从所述多个第一聚类结果中选择多个图像区域作为所述多个前景图像,以及从所述多个第二聚类结果中选择多个第三图像作为所述多个背景图像。
  8. 如权利要求1所述的方法,其中,所述对所述多个前景图像和所述多个背景图像进行融合,包括:
    确定所述多个前景图像在所述视频流中所处的一帧图像的采集时间,以及所述多个前景图像在所处的一帧图像中的位置,以得到所述多个前景图像分别对应的采集时间和图像位置;
    对所述多个背景图像分别进行语义分割,以确定所述多个背景图像分别对应的语义分割信息;
    确定所述多个背景图像在所述视频流中所处的一帧图像的采集时间,以得到所述多个背景图像分别对应的采集时间;
    基于所述多个前景图像分别对应的采集时间和图像位置,以及所述多个背景图像分别对应的采集时间和语义分割信息,对所述多个前景图像和所述多个背景图像进行融合。
  9. 一种确定样本图像的装置,所述装置包括:
    第一确定模块,用于对视频流进行漏误报分析,以确定第一图像和所述第一图像对应的漏误报结果,所述第一图像为所述视频流中存在漏误报对象的图像,所述漏误报对象为漏分析的目标对象或者误分析的目标对象;
    获取模块,用于基于所述第一图像对应的漏误报结果,从所述视频流中获取多个前景图像和多个背景图像;
    融合模块,用于对所述多个前景图像和所述多个背景图像进行融合,以得到多个第二图像;
    第二确定模块,用于将所述第一图像和所述多个第二图像确定为样本图像。
  10. 如权利要求9所述的装置,其中,所述第一确定模块包括:
    第一确定子模块,用于通过多个第一神经网络模型以及第二神经网络模型,对所述视频流中的任一帧图像进行漏误报分析,以确定所述任一帧图像中是否存在漏误报对象,所述第二神经网络模型为能够对所述任一帧图像中的所有目标对象进行分析的模型;
    第二确定子模块,用于如果所述任一帧图像中存在漏误报对象,则将所述任一帧图像作为所述第一图像,并确定所述第一图像对应的漏误报结果。
  11. 如权利要求10所述的装置,其中,所述第一确定子模块包括:
    第一确定单元,用于通过所述多个第一神经网络模型分别确定所述任一帧图像对应的第一分析结果,以得到多个第一分析结果,所述第一分析结果包括所述任一帧图像中至少一个目标对象的位置信息和每个目标对象的第一标签;
    第二确定单元,用于通过所述第二神经网络模型确定所述任一帧图像对应的第二分析结果,所述第二分析结果包括所述任一帧图像中至少一个目标对象的位置信息和每个目标对象的第二标签;
    第三确定单元,用于基于所述多个第一分析结果所包括的目标对象的位置信息和第一标签,以及所述第二分析结果所包括的目标对象的位置信息和第二标签,确定所述任一帧图像中是否存在漏误报对象。
  12. 如权利要求11所述的装置,其中,所述第三确定单元具体用于:
    如果所述多个第一分析结果所包括的目标对象的第一标签与所述第二分析结果所包括的目标对象的第二标签均相同,且所述多个第一分析结果所包括的目标对象的位置信息与所述第二分析结果所包括的目标对象的位置信息中的任意两个位置信息之间的交并比均大于预设比例阈值,则确定所述任一帧图像中不存在漏误报对象,否则,确定所述任一帧图像中存在漏误报对象。
  13. 如权利要求12所述的装置,其中,所述第三确定单元具体用于:
    如果所述多个第一分析结果所包括的目标对象的位置信息和第一标签均相同,且与所述第二分析结果所包括的目标对象的位置信息和第二标签相同,则确定所述任一帧图像中不存在漏误报对象,否则,确定所述任一帧图像中存在漏误报对象。
  14. 如权利要求9所述的装置,其中,所述漏误报结果包括所述第一图像中的至少一个漏误报对象的位置信息,以及所述第一图像的采集时间;
    所述获取模块包括:
    第三确定子模块,用于基于所述至少一个漏误报对象的位置信息确定出至少一个空间范围,所述至少一个空间范围与所述至少一个漏误报对象一一对应;
    第四确定子模块,用于基于所述第一图像的采集时间确定出一个时间范围;
    获取子模块,用于基于所述时间范围和所述至少一个空间范围,从所述视频流中获取所述多个前景图像和所述多个背景图像。
  15. 如权利要求14所述的装置,其中,所述获取子模块具体用于:
    从所述视频流中获取采集时间位于所述时间范围内、位置位于所述至少一个空间范围内且为前景的图像区域,以得到多个图像区域;
    从所述视频流中获取采集时间位于所述时间范围内且不存在前景的图像,以得到多个第三图像;
    对所述多个图像区域进行聚类,以得到多个第一聚类结果,以及对所述多个第三图像进行聚类,以得到多个第二聚类结果;
    从所述多个第一聚类结果中选择多个图像区域作为所述多个前景图像,以及从所述多个第二聚类结果中选择多个第三图像作为所述多个背景图像。
  16. 如权利要求9所述的装置,其中,所述融合模块具体用于:
    确定所述多个前景图像在所述视频流中所处的一帧图像的采集时间,以及所述多个前景图像在所处的一帧图像中的位置,以得到所述多个前景图像分别对应的采集时间和图像位置;
    对所述多个背景图像分别进行语义分割,以确定所述多个背景图像分别对应的语义分割信息;
    确定所述多个背景图像在所述视频流中所处的一帧图像的采集时间,以得到所述多个背景图像分别对应的采集时间;
    基于所述多个前景图像分别对应的采集时间和图像位置,以及所述多个背景图像分别对应的采集时间和语义分割信息,对所述多个前景图像和所述多个背景图像进行融合。
  17. 一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器用于存放计算机程序,所述处理器用于执行所述存储器上所存放的计算机程序,以实现上述权利要求1-8任一所述方法的步骤。
  18. 一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述的方法的步骤。
PCT/CN2022/126678 2021-10-22 2022-10-21 确定样本图像的方法、装置、设备及存储介质 WO2023066373A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111235744.8A CN113936240A (zh) 2021-10-22 2021-10-22 确定样本图像的方法、装置、设备及存储介质
CN202111235744.8 2021-10-22

Publications (1)

Publication Number Publication Date
WO2023066373A1 true WO2023066373A1 (zh) 2023-04-27

Family

ID=79283859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126678 WO2023066373A1 (zh) 2021-10-22 2022-10-21 确定样本图像的方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113936240A (zh)
WO (1) WO2023066373A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936240A (zh) * 2021-10-22 2022-01-14 杭州海康威视数字技术股份有限公司 确定样本图像的方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7079151B1 (en) * 2002-02-08 2006-07-18 Adobe Systems Incorporated Compositing graphical objects
CN103324953A (zh) * 2013-05-29 2013-09-25 深圳市智美达科技有限公司 视频监控多目标检测与跟踪方法
CN108154518A (zh) * 2017-12-11 2018-06-12 广州华多网络科技有限公司 一种图像处理的方法、装置、存储介质及电子设备
CN108460414A (zh) * 2018-02-27 2018-08-28 北京三快在线科技有限公司 训练样本图像的生成方法、装置及电子设备
CN111563468A (zh) * 2020-05-13 2020-08-21 电子科技大学 一种基于神经网络注意力的驾驶员异常行为检测方法
CN113936240A (zh) * 2021-10-22 2022-01-14 杭州海康威视数字技术股份有限公司 确定样本图像的方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7079151B1 (en) * 2002-02-08 2006-07-18 Adobe Systems Incorporated Compositing graphical objects
CN103324953A (zh) * 2013-05-29 2013-09-25 深圳市智美达科技有限公司 视频监控多目标检测与跟踪方法
CN108154518A (zh) * 2017-12-11 2018-06-12 广州华多网络科技有限公司 一种图像处理的方法、装置、存储介质及电子设备
CN108460414A (zh) * 2018-02-27 2018-08-28 北京三快在线科技有限公司 训练样本图像的生成方法、装置及电子设备
CN111563468A (zh) * 2020-05-13 2020-08-21 电子科技大学 一种基于神经网络注意力的驾驶员异常行为检测方法
CN113936240A (zh) * 2021-10-22 2022-01-14 杭州海康威视数字技术股份有限公司 确定样本图像的方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113936240A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
WO2022127919A1 (zh) 表面缺陷检测方法、装置、系统、存储介质及程序产品
WO2020048308A1 (zh) 多媒体资源分类方法、装置、计算机设备及存储介质
CN110807361B (zh) 人体识别方法、装置、计算机设备及存储介质
CN110097576B (zh) 图像特征点的运动信息确定方法、任务执行方法和设备
CN108924737B (zh) 定位方法、装置、设备及计算机可读存储介质
WO2020224222A1 (zh) 目标群组检测方法、装置、计算机设备及存储介质
CN108288032B (zh) 动作特征获取方法、装置及存储介质
CN111127509B (zh) 目标跟踪方法、装置和计算机可读存储介质
CN110839128B (zh) 拍照行为检测方法、装置及存储介质
WO2020249025A1 (zh) 身份信息的确定方法、装置及存储介质
CN110290426B (zh) 展示资源的方法、装置、设备及存储介质
CN111104980B (zh) 确定分类结果的方法、装置、设备及存储介质
CN110933468A (zh) 播放方法、装置、电子设备及介质
WO2020211607A1 (zh) 生成视频的方法、装置、电子设备及介质
CN113395542A (zh) 基于人工智能的视频生成方法、装置、计算机设备及介质
CN111178343A (zh) 基于人工智能的多媒体资源检测方法、装置、设备及介质
CN111027490A (zh) 人脸属性识别方法及装置、存储介质
WO2023066373A1 (zh) 确定样本图像的方法、装置、设备及存储介质
CN111192072A (zh) 用户分群方法及装置、存储介质
CN111353513B (zh) 一种目标人群筛选的方法、装置、终端和存储介质
CN111753813A (zh) 图像处理方法、装置、设备及存储介质
CN110853124A (zh) 生成gif动态图的方法、装置、电子设备及介质
CN113613028B (zh) 直播数据处理方法、装置、终端、服务器及存储介质
CN112001442B (zh) 特征检测方法、装置、计算机设备及存储介质
CN113763932A (zh) 语音处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882978

Country of ref document: EP

Kind code of ref document: A1