WO2020156361A1 - Training sample obtaining method and apparatus, electronic device and storage medium - Google Patents
Training sample obtaining method and apparatus, electronic device and storage medium Download PDFInfo
- Publication number
- WO2020156361A1 WO2020156361A1 PCT/CN2020/073396 CN2020073396W WO2020156361A1 WO 2020156361 A1 WO2020156361 A1 WO 2020156361A1 CN 2020073396 W CN2020073396 W CN 2020073396W WO 2020156361 A1 WO2020156361 A1 WO 2020156361A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- frame
- scene
- feature information
- target
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Definitions
- the present invention relates to the field of machine learning technology, and in particular to a method, device, electronic equipment and computer-readable storage medium for obtaining training samples.
- the purpose of the present invention is to provide a method, a device, an electronic device and a computer-readable storage medium for obtaining training samples to solve the problems of low efficiency and high cost of obtaining image training samples in the prior art.
- the present invention provides a method for obtaining training samples, including:
- the image of each marked video frame in the scene segment is extracted as a training sample.
- the obtaining the scene fragment in the video includes:
- the video is a single scene video, use the video as a scene segment
- the scene switching detection technology is used to divide the video into multiple scene segments.
- the scene switching detection technology includes: a pixel domain-based detection algorithm and/or a compressed domain-based detection algorithm.
- the method before extracting the feature information of the target area marked in the initial frame, the method further includes:
- Image preprocessing is performed on the initial frame to make the feature information of the target region in the initial frame more obvious.
- the feature information of the target area includes one or more of color features, texture features, and shape features.
- the step of performing feature search on the forward and/or backward video frames in the scene segment includes:
- feature search is performed on the forward and/or backward video frames in the scene segment.
- the method further includes:
- the target feature information is: feature information of the marked area in the adjacent preset number of frames of the searched frame.
- the present invention also provides a training sample obtaining device, including:
- Obtaining module used to obtain scene fragments in the video
- a first labeling module configured to select a video frame containing a target object in the scene fragment as an initial frame, and label the target area in the initial frame;
- a first extraction module configured to extract feature information of the target area marked in the initial frame
- the second labeling module is used to perform feature search on the forward and/or backward video frames in the scene segment based on the initial frame, and determine the feature information in each searched frame and the feature of the target area The area where the information matches, and automatically mark the area determined in each searched frame;
- the second extraction module is used to extract the marked images of each video frame in the scene segment as training samples.
- the obtaining module obtains a scene segment in the video, including:
- the video is a single scene video, use the video as a scene segment
- the scene switching detection technology is used to divide the video into multiple scene segments.
- the scene switching detection technology includes: a pixel domain-based detection algorithm and/or a compressed domain-based detection algorithm.
- the device further includes:
- the preprocessing module is configured to perform image preprocessing on the initial frame before the first extraction module extracts the feature information of the target region marked in the initial frame, so that the The characteristic information of the target area is more obvious.
- the feature information of the target area includes one or more of color features, texture features, and shape features.
- the second extraction module performs feature search on the forward and/or backward video frames in the scene segment, including:
- feature search is performed on the forward and/or backward video frames in the scene segment.
- the second extraction module is further used for:
- the target feature information is: feature information of the marked area in the adjacent preset number of frames of the searched frame.
- the present invention also provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; wherein,
- the memory is used to store computer programs
- the processor When the processor is used to execute the computer program stored in the memory, it implements the training sample obtaining method described in any one of the above.
- the present invention also provides a computer-readable storage medium having a computer program stored in the computer-readable storage medium, and when the computer program is executed by a processor, the training sample obtaining method described in any one of the above is implemented.
- the solution provided by the present invention firstly annotates the initial frame in the scene segment of the video, and then uses the target tracking technology to automatically annotate other video frames in the entire scene segment, thereby obtaining a large number of annotated images as a subsequent target recognition model Training samples.
- manual annotation is performed by acquiring a large number of pictures.
- the cost of image acquisition and annotation is relatively high.
- the present invention can shoot a video, and the acquisition of annotation materials is more convenient and easy. Then a large number of automatically marked samples can be collected from the video, which reduces The cost of sample labeling improves the efficiency of labeling processing.
- FIG. 1 is a schematic flowchart of a method for obtaining training samples according to an embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a training sample obtaining apparatus according to an embodiment of the present invention.
- Fig. 3 is a structural block diagram of an electronic device provided by an embodiment of the present invention.
- the embodiments of the present invention provide a method, device, electronic device, and computer-readable storage medium for obtaining training samples.
- the training sample obtaining method of the embodiment of the present invention can be applied to the training sample obtaining device of the embodiment of the present invention, and the training sample obtaining device can be configured on an electronic device.
- the electronic device may be a personal computer, a mobile terminal, etc.
- the mobile terminal may be a hardware device with various operating systems such as a mobile phone or a tablet computer.
- Fig. 1 is a schematic flowchart of a method for obtaining training samples according to an embodiment of the present invention. Please refer to Fig. 1.
- a method for obtaining training samples may include the following steps:
- a video is generally composed of one or more scene segments, and a scene is composed of multiple video frames.
- the video on which the present invention is based can be a single-scene video or a multi-scene video. If the video is a single scene video, since the video contains only one scene segment, the video can be directly used as an obtained scene segment, and subsequent processing steps are executed.
- the scene switching detection technology can be used to divide the video into multiple scene segments. After dividing multiple scene fragments, only one of the scene fragments can be used, and by performing subsequent processing steps, the images of each marked video frame in the scene fragment can be obtained as training samples, or it can be uniform for each scene fragment. Performing subsequent processing steps can further increase the number of training samples obtained.
- Scene switching detection technology refers to finding out the frame and frame position where scene switching occurs in a video.
- the obtained frame position can be used for fast and accurate video editing or further processing, and the frame sequence composed of the obtained frames can be used for rough description The entire video content.
- traditional video scene switching detection methods generally use manual feature extraction methods, such as calculating the color histogram similarity of adjacent frames, or directly calculating the frame difference, or using the change of high-frequency subband coefficients of each frame in the video scene
- the degree feature VH viewportHeight, window height
- the calculation of high-frequency subband coefficients requires algorithms such as three-dimensional wavelet transform.
- These technologies will calculate a feature value and compare it with the threshold. If it is greater than the threshold or less than the threshold, it is determined For switching frames.
- adaptive threshold algorithms based on the above technologies, such as a video scene change detection method based on adaptive thresholds, but the sliding window size and preset thresholds in this method still need to be manually set.
- the scene switching detection technology can adopt a pixel domain-based detection algorithm or a compression domain-based detection algorithm, and set corresponding scene switching thresholds according to different scenes, which can improve the speed and accuracy of scene switching detection.
- the detection algorithm based on the pixel domain or the compressed domain can be referred to the prior art, which will not be repeated here.
- S102 Select a video frame containing a target object in the scene segment as an initial frame, and mark the target area where the target object is located in the initial frame.
- the target object may be an object of interest.
- it can be identified according to the video frames it contains, and a video frame containing the target object can be selected as the initial frame for labeling.
- the first frame where the target object appears can be selected as the initial frame. If the target object is in the first frame If the feature of is not obvious, look for a frame with more obvious features of the target object in the subsequent video frames as the initial frame. The requirements for this step are not very strict. You can probably choose a better video frame as the initial frame.
- Its purpose is to mark the target area where the target object is located, so as to extract the feature information of the target area, so that the feature can be passed in the subsequent processing.
- the search automatically marks the feature matching area in the previous or next video frames.
- image preprocessing such as image denoising, contrast enhancement, etc.
- image denoising contrast enhancement, etc.
- the feature information of the target area may include one or more of color features, texture features, and shape features.
- the color feature is a global feature that describes the surface properties of the scene corresponding to the image or image area. Generally, color characteristics are based on the characteristics of pixels. At this time, all pixels belonging to an image or image area have their own contributions. Color histogram is the most commonly used method to express color characteristics. It can simply describe the global distribution of colors in an image, that is, the proportions of different colors in the entire image. It is especially suitable for describing images and images that are difficult to automatically segment. The image of the spatial position of the object does not need to be considered, and it is not affected by the change of image rotation and translation, and further normalization is not affected by the change of image scale. The most commonly used color spaces are RGB color space and HSV color space. Color histogram feature matching methods include: histogram intersection method, distance method, center distance method, reference color table method, and cumulative color histogram method.
- Texture feature is also a kind of global feature, it also describes the surface properties of the scene corresponding to the image or image area.
- texture is only a characteristic of the surface of an object, and cannot fully reflect the essential attributes of the object, high-level image content cannot be obtained by using only texture features.
- the texture feature is not based on the feature of pixels, it needs to perform statistical calculation in the area containing multiple pixels. In pattern matching, this regional feature has greater advantages and will not fail to match successfully due to local deviations.
- texture features often have rotation invariance and have strong resistance to noise.
- texture features include statistical methods, geometric methods, model methods, and signal processing methods.
- the typical representative of statistical methods is a texture feature analysis method called gray-level co-occurrence matrix.
- Gotsch and Kreyszig et al. based on the study of various statistical features in the co-occurrence matrix, through experiments, obtained four gray-level co-occurrence matrix Key features: energy, inertia, entropy, and correlation;
- another typical method in statistical methods is to extract texture features from the image's autocorrelation function (ie, the image's energy spectrum function), that is, through the image's energy spectrum function Calculate and extract characteristic parameters such as texture thickness and directionality.
- the geometric method is a texture feature analysis method based on the theory of texture primitives (basic texture elements).
- the texture primitive theory believes that a complex texture can be composed of a number of simple texture primitives in a certain regular form.
- Voronio checkerboard feature method and structural method.
- the model method is based on the structural model of the image, and uses the parameters of the model as texture features.
- Typical methods are random field model methods, such as Markov random field (MRF) model method and Gibbs random field model method.
- MRF Markov random field
- the extraction and matching of texture features mainly include: gray level co-occurrence matrix, Tamura texture feature, autoregressive texture model, wavelet transform, etc.
- the feature extraction and matching of gray-level co-occurrence matrix mainly rely on four parameters: energy, inertia, entropy and correlation.
- Tamura's texture feature is based on the psychological research of human visual perception of texture, and proposes six attributes, namely: roughness, contrast, direction, line image, regularity and roughness.
- the auto-regressive texture model Simultaneous auto-regressive, SAR
- SAR Markov Random Field
- MRF Markov Random Field
- the characteristic of the shape feature is that various retrieval methods based on the shape feature can effectively use the target of interest in the image for retrieval.
- shape features there are two types of representation methods for shape features, one is contour features, and the other is regional features.
- the contour feature of the image is mainly for the outer boundary of the object, while the regional feature of the image is related to the entire shape area.
- Boundary feature method this method obtains the shape parameters of the image by describing the boundary feature.
- the Hough transform method for detecting parallel lines and the histogram method for boundary directions are classic methods.
- Hough transform is a method that uses the global characteristics of the image to connect edge pixels to form a closed boundary of a region. The basic idea is the point-line duality; the boundary direction histogram method first differentiates the image to obtain the edge of the image, and then makes Regarding the histogram of the edge size and direction, the usual method is to construct an image gray gradient direction matrix.
- the basic idea of the Fourier shape descriptor method is to use the Fourier transform of the object boundary as the shape description, and use the closedness and periodicity of the region boundary to transform a two-dimensional problem into a one-dimensional problem.
- Three shape expressions are derived from boundary points, which are curvature function, centroid distance, and complex coordinate function.
- the geometric parameter method is a simpler area feature description method used for shape expression and matching.
- a shape factor method shape factor
- shape factor is used for quantitative measurement of shape (such as moment, area, perimeter, etc.).
- QBIC a content-based image retrieval system
- geometric parameters such as roundness, eccentricity, principal axis direction, and algebraic invariant moments are used for image retrieval based on shape features.
- the shape invariant moment method uses the moment of the area occupied by the target as the shape description parameter.
- representation and matching of shape features also include methods such as Finite Element Method (FEM), Turning Function (Turning Function), and Wavelet Descriptor (Wavelet Descriptor).
- FEM Finite Element Method
- Turning Function Turning Function
- Wavelet Descriptor Wavelet Descriptor
- This method based on wavelet and relative moments. This method first uses wavelet transform modulus maxima to obtain multi-scale edge images, then calculates 7 invariant moments of each scale, and then converts them into 10 relative moments , The relative moments on all scales are used as image feature vectors to unify the region and closed and unclosed structures.
- S104 Using the initial frame as a reference, perform feature search on the forward and/or backward video frames in the scene segment, and determine the area in each searched frame whose feature information matches the feature information of the target area , And automatically mark the area determined in each searched frame.
- each searched video frame can also be pre-processed, such as image denoising, contrast enhancement, etc., to make the feature information of the matching area in each searched frame more obvious.
- the mean shift algorithm is a non-parametric method based on density gradient rise, which finds the target position through iterative calculations to achieve target tracking.
- the so-called tracking is to find the position of the target in the next frame through the known position of the target in the image frame.
- the significant advantage of the mean shift algorithm is that the algorithm has a small amount of calculation, is simple and easy to implement, and is very suitable for real-time tracking. Through experiments, it is proposed to use the kernel histogram to calculate the target distribution, which proves that the mean shift algorithm has good real-time characteristics.
- Mean shift has a wide range of applications in clustering, image smoothing, segmentation and tracking.
- the mean shift algorithm locks the local maximum of the probability function in an iterative manner. For example, if there is a rectangular window to frame a certain part of an image, the principle is to find the center of gravity of the data point in the predefined window, or the weighted average. The algorithm moves the center of the window to the center of gravity of the data point, and repeats this process until the center of gravity of the window converges to a stable point. Therefore, whether the result of the iteration is good or bad depends on the input probability map (the above-mentioned predefined window) and its initial position.
- the entire tracking steps of the mean shift algorithm include: setting the initial tracking target, that is, framing the target to be tracked; obtaining the histogram of the chromaticity H channel image in the HSV of the target to be tracked; normalizing the histogram to be tracked; and obtaining new data
- the histogram to be tracked is back-projected in the frame image; the mean value shifts, and the tracking position is updated.
- Kalman filter It can overcome the shortcoming that Wiener filter needs infinite past data and it is difficult to guarantee real-time performance. It is impossible to make the final real result and the filtering result completely equal, and can only be approximated. Kalman filtering selects the minimum mean square error as the criterion, and introduces a state space model for recursive estimation. Kalman filters are often used in navigation, radar, surveillance and other fields involving target tracking. The basic process is: adopt the state space model of signal and noise, recursively in the order of "prediction-actual measurement-correction", use the information of the previous moment to estimate the state variable at the current moment, and use the real observation value The model at the previous moment was adjusted.
- Kalman filter A typical application of Kalman filter is to predict the state of the target at the next moment from a limited observation value including the target position and noise.
- target tracking is the process of selecting the target corresponding to the determined target from the multiple foreground blocks detected in the current frame, thereby obtaining the target's trajectory.
- the Kalman filter is used to predict the change of the position and the target center, and then the target is accurately located through multi-feature matching. This is the target tracking of the Kalman filter.
- the use of Kalman filter to track the target is mainly divided into four steps: the first step is to calculate the target center, SIFT feature, color histogram and other feature points according to the target detection result; the second step is based on the target Set the prediction area at the Kalman prediction position of the next frame, and select eligible candidate targets in this area to match one by one; the third step is to define similarity functions for SIFT features, color histograms, target centers and other features, and select the best Match the target; the fourth step is to optimize the Kalman filter parameters according to the target state (such as normal tracking, tracking loss, fusion splitting, target entry and exit).
- the target state such as normal tracking, tracking loss, fusion splitting, target entry and exit.
- Particle filtering is a non-parametric Monte Carlo simulation method to achieve recursive Bayesian filtering. It is suitable for any nonlinear system that can be described by a state space model, and its accuracy can approach the optimal estimation.
- Particle filters are simple and easy to implement. They provide an effective solution for analyzing nonlinear dynamic systems, and are widely used in target tracking, signal processing, and automatic control.
- the core idea of the particle filter algorithm is to use the weighted sum of a series of random samples to approximate the posterior probability density function, and approximate the integral operation by summing. This algorithm is derived from Monte Carlo's idea, that is, the frequency of an event is used to refer to the probability of the event.
- Prediction stage The particle filter first generates a large number of samples based on the state transition function prediction, these samples are called particles, and the weighted sum of these particles is used to approximate the posterior probability density;
- Correction stage As the observations arrive in sequence, the corresponding importance weight is calculated for each particle. This weight represents the probability of obtaining the observation when the predicted pose takes the i-th particle. In this way, all particles are evaluated in this way, the more likely to obtain the observed particles, the higher the weight obtained;
- Re-sampling stage re-distribute the sampled particles according to the weight ratio. Since the number of particles that approximate the continuous distribution is limited, this step is very important. In the next round of filtering, input the resampled particle set into the state transition equation to obtain new predicted particles;
- Map estimation For each sampled particle, the corresponding map estimation is calculated based on the sampled trajectory and observation.
- the target characteristic information is acquired, and the characteristic information in the searched frame is determined to match the target characteristic information.
- a searched frame does not match the feature extracted from the initial frame, it means that the feature change of the target object in the current frame (that is, the searched frame) exceeds the threshold and cannot be matched, then you can start from the current
- the previous frame or the previous few frames select the frame that successfully matches the features of the initial frame, and perform feature matching and automatic labeling on the current frame again according to the feature information of the marked area in the selected frame. If the feature information of the marked area in the first few frames of the current frame cannot be matched with the current frame, you can select the frame that successfully matches the features of the initial frame from the next or next frames of the current frame.
- the frames are again feature-matched and automatically labeled.
- the video frame of the next scene segment can be used for feature matching.
- the current frame is the first frame of the current scene segment, it can be used in the previous scene Feature matching is performed in the segment. If the matching feature is still not found, the median value of the feature point coordinates of the previous and subsequent frames of the current frame can be used as the feature point coordinates of the current frame, and then the area in the current frame can be adjusted and labeled by manual processing.
- the feature point coordinates of the intermediate frames of these consecutive frames can be estimated first, and then the median frame coordinates of the preceding and following frames and intermediate frames can be estimated in turn until All frames are estimated, and then the areas in these consecutive frames are adjusted and labeled by manual processing; it is also possible to manually adjust and label the areas in the intermediate frames after the feature point coordinates of the intermediate frames are estimated, and then extract the intermediate frames The feature information of the newly labeled area is then automatically matched and labeled for the previous and next frames.
- the image of each marked video frame can be extracted as a training sample. Since the scene fragment contains a large number of video frames, a large number of labeled image training samples can be obtained based on each scene fragment.
- the solution provided by the present invention firstly annotates the initial frame in the video scene segment, and then uses target tracking technology to automatically annotate other video frames in the entire scene segment, thereby obtaining a large number of annotated images as The training samples of the target recognition model are established later.
- manual annotation is performed by acquiring a large number of pictures, and the cost of image acquisition and annotation is relatively high.
- the present invention can use to shoot a video. The acquisition of materials is more convenient and easy. Then a large number of automatically marked samples can be collected from the video, which reduces The cost of sample labeling improves the efficiency of labeling processing.
- the present invention also provides a device for obtaining training samples.
- the device includes:
- the obtaining module 201 is used to obtain scene fragments in the video
- the first labeling module 202 is configured to select a video frame containing a target object in the scene fragment as an initial frame, and label the target area where the target object is located in the initial frame;
- the first extraction module 203 is configured to extract feature information of the target area marked in the initial frame
- the second labeling module 204 is configured to perform feature search on the forward and/or backward video frames in the scene segment based on the initial frame, and determine the difference between the feature information in each searched frame and the target area. Areas that match the feature information, and automatically mark the areas determined in each searched frame;
- the second extraction module 205 is configured to extract the marked images of each video frame in the scene segment as training samples.
- the obtaining module 201 is specifically used for:
- the video is a single scene video, use the video as a scene segment
- the scene switching detection technology is used to divide the video into multiple scene segments.
- the scene switching detection technology includes: a detection algorithm based on a pixel domain and a detection algorithm based on a compressed domain.
- the device further includes:
- the preprocessing module is configured to perform image preprocessing on the initial frame before the first extraction module 203 extracts the feature information of the target area marked in the initial frame, so that the The characteristic information of the target area is more obvious.
- the feature information of the target area includes one or more of color features, texture features, and shape features.
- the second extraction module 204 performs feature search on the forward and/or backward video frames in the scene segment, specifically:
- feature search is performed on the forward and/or backward video frames in the scene segment.
- the second extraction module 204 is further configured to:
- the target feature information is: feature information of the marked area in the adjacent preset number of frames of the searched frame.
- the present invention also provides an electronic device, as shown in FIG. 3, including a processor 301, a communication interface 302, a memory 303, and a communication bus 304.
- the processor 301, the communication interface 302, and the memory 303 complete each other through the communication bus 304. Communication between,
- the memory 303 is used to store computer programs
- the processor 301 is configured to implement the following steps when executing the program stored in the memory 303:
- the image of each marked video frame in the scene segment is extracted as a training sample.
- the communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used for communication between the aforementioned electronic device and other devices.
- the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
- NVM non-Volatile Memory
- the memory may also be at least one storage device located far away from the foregoing processor.
- the foregoing processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- CPU central processing unit
- NP Network Processor
- DSP Digital Signal Processing
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- FPGA Field-Programmable Gate Array
- the present invention also provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the method steps of the above-mentioned training sample obtaining method are realized.
Abstract
Description
Claims (16)
- 一种训练样本获得方法,其特征在于,包括:A method for obtaining training samples is characterized in that it includes:获得视频中的场景片段;Obtain scene fragments in the video;在所述场景片段中选择一个包含目标对象的视频帧作为初始帧,对所述初始帧中所述目标对象所在的目标区域进行标注;Selecting a video frame containing a target object in the scene fragment as an initial frame, and marking the target area where the target object is located in the initial frame;提取所述初始帧中被标注的所述目标区域的特征信息;Extracting feature information of the target area marked in the initial frame;以所述初始帧为基准,对所述场景片段中前向和/或后向的视频帧进行特征搜索,确定各个被搜索帧中特征信息与所述目标区域的特征信息相匹配的区域,并对各个被搜索帧中所确定的区域进行自动标注;Using the initial frame as a reference, perform a feature search on the forward and/or backward video frames in the scene segment, and determine the area in each searched frame whose feature information matches the feature information of the target area, and Automatically mark the area determined in each searched frame;提取所述场景片段中已标注的各个视频帧的图像作为训练样本。The image of each marked video frame in the scene segment is extracted as a training sample.
- 如权利要求1所述的训练样本获得方法,其特征在于,所述获得视频中的场景片段,包括:The method for obtaining training samples according to claim 1, wherein said obtaining a scene segment in a video comprises:若所述视频为单场景视频,则将所述视频作为一个场景片段;If the video is a single scene video, use the video as a scene segment;若所述视频为多场景视频,则利用场景切换检测技术,将所述视频划分为多个场景片段。If the video is a multi-scene video, the scene switching detection technology is used to divide the video into multiple scene segments.
- 如权利要求2所述的训练样本获得方法,其特征在于,所述场景切换检测技术包括:基于像素域的检测算法和/或基于压缩域的检测算法。The method for obtaining training samples according to claim 2, wherein the scene switching detection technology comprises: a detection algorithm based on a pixel domain and/or a detection algorithm based on a compressed domain.
- 如权利要求1所述的训练样本获得方法,其特征在于,在提取所述初始帧中被标注的所述目标区域的特征信息之前,还包括:5. The method for obtaining training samples according to claim 1, wherein before extracting the feature information of the target area marked in the initial frame, the method further comprises:对所述初始帧进行图像预处理,以使所述初始帧中所述目标区域的特征信息更加明显。Image preprocessing is performed on the initial frame to make the feature information of the target region in the initial frame more obvious.
- 如权利要求1所述的训练样本获得方法,其特征在于,所述目标区域的特征信息,包括:颜色特征、纹理特征和形状特征中的一种或多种。The method for obtaining training samples according to claim 1, wherein the feature information of the target region includes one or more of color features, texture features, and shape features.
- 如权利要求1所述的训练样本获得方法,其特征在于,对所述场景片段中前向和/或后向的视频帧进行特征搜索的步骤包括:The method for obtaining training samples according to claim 1, wherein the step of performing feature search on the forward and/or backward video frames in the scene segment comprises:利用均值漂移算法、Kalman滤波算法或粒子滤波算法,对所述场景片段中前向和/或后向的视频帧进行特征搜索。Using a mean shift algorithm, a Kalman filter algorithm, or a particle filter algorithm, feature search is performed on the forward and/or backward video frames in the scene segment.
- 如权利要求1所述的训练样本获得方法,其特征在于,所述方法还包括:The method for obtaining training samples according to claim 1, wherein the method further comprises:如果某一被搜索帧中不存在特征信息与所述目标区域的特征信息相匹配的区域,则获取目标特征信息,确定该被搜索帧中特征信息与所述目标特征信息相匹配的区域,并对该被搜索帧中所确定的区域进行自动标注;If there is no area in a searched frame whose feature information matches the feature information of the target area, acquire the target feature information, determine the area in the searched frame where the feature information matches the target feature information, and Automatically mark the area determined in the searched frame;其中,所述目标特征信息为:该被搜索帧的相邻预设数量帧中已被标注区域的特征信息。Wherein, the target feature information is: feature information of the marked area in the adjacent preset number of frames of the searched frame.
- 一种训练样本获得装置,其特征在于,包括:A device for obtaining training samples is characterized by comprising:获得模块,用于获得视频中的场景片段;Obtaining module, used to obtain scene fragments in the video;第一标注模块,用于在所述场景片段中选择一个包含目标对象的视频帧作为初始帧,对所述初始帧中的所述目标对象所在的目标区域进行标注;The first labeling module is configured to select a video frame containing a target object in the scene fragment as an initial frame, and label the target area where the target object is located in the initial frame;第一提取模块,用于提取所述初始帧中被标注的所述目标区域的特征信息;A first extraction module, configured to extract feature information of the target area marked in the initial frame;第二标注模块,用于以所述初始帧为基准,对所述场景片段中前向和/或后向的视频帧进行特征搜索,确定各个被搜索帧中特征信息与所述目标区域的特征信息相匹配的区域,并对各个被搜索帧中所确定的区域进行自动标注;The second labeling module is used to perform feature search on the forward and/or backward video frames in the scene segment based on the initial frame, and determine the feature information in each searched frame and the feature of the target area The area where the information matches, and automatically mark the area determined in each searched frame;第二提取模块,用于提取所述场景片段中已标注的各个视频帧的图像作为训练样本。The second extraction module is used to extract the marked images of each video frame in the scene segment as training samples.
- 如权利要求8所述的训练样本获得装置,其特征在于,所述获得模块获得视频中的场景片段,包括:8. The training sample obtaining device according to claim 8, wherein the obtaining module obtains scene fragments in the video, comprising:若所述视频为单场景视频,则将所述视频作为一个场景片段;If the video is a single scene video, use the video as a scene segment;若所述视频为多场景视频,则利用场景切换检测技术,将所述视频划分为多个场景片段。If the video is a multi-scene video, the scene switching detection technology is used to divide the video into multiple scene segments.
- 如权利要求9所述的训练样本获得装置,其特征在于,所述场景切换检测技术包括:基于像素域的检测算法和/或基于压缩域的检测算法。9. The training sample obtaining device according to claim 9, wherein the scene switching detection technology comprises: a detection algorithm based on a pixel domain and/or a detection algorithm based on a compressed domain.
- 如权利要求8所述的训练样本获得装置,其特征在于,所述装置还包括:8. The training sample obtaining device according to claim 8, wherein the device further comprises:预处理模块,用于在所述第一提取模块提取所述初始帧中被标注的所述 目标区域的特征信息之前,对所述初始帧进行图像预处理,以使所述初始帧中所述目标区域的特征信息更加明显。The preprocessing module is configured to perform image preprocessing on the initial frame before the first extraction module extracts the feature information of the target region marked in the initial frame, so that the The characteristic information of the target area is more obvious.
- 如权利要求8所述的训练样本获得装置,其特征在于,所述目标区域的特征信息,包括:颜色特征、纹理特征和形状特征中的一种或多种。8. The training sample obtaining device according to claim 8, wherein the feature information of the target region includes one or more of color feature, texture feature, and shape feature.
- 如权利要求8所述的训练样本获得装置,其特征在于,所述第二提取模块对所述场景片段中前向和/或后向的视频帧进行特征搜索,包括:The training sample obtaining device according to claim 8, wherein the second extraction module performs feature search on the forward and/or backward video frames in the scene segment, comprising:利用均值漂移算法、Kalman滤波算法或粒子滤波算法,对所述场景片段中前向和/或后向的视频帧进行特征搜索。Using a mean shift algorithm, a Kalman filter algorithm, or a particle filter algorithm, feature search is performed on the forward and/or backward video frames in the scene segment.
- 如权利要求8所述的训练样本获得装置,其特征在于,所述第二提取模块还用于:8. The training sample obtaining device according to claim 8, wherein the second extraction module is further configured to:如果某一被搜索帧中不存在特征信息与所述目标区域的特征信息相匹配的区域,则获取目标特征信息,确定该被搜索帧中特征信息与所述目标特征信息相匹配的区域,并对该被搜索帧中所确定的区域进行自动标注;If there is no area in a searched frame whose feature information matches the feature information of the target area, acquire the target feature information, determine the area in the searched frame where the feature information matches the target feature information, and Automatically mark the area determined in the searched frame;其中,所述目标特征信息为:该被搜索帧的相邻预设数量帧中已被标注区域的特征信息。Wherein, the target feature information is: feature information of the marked area in the adjacent preset number of frames of the searched frame.
- 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;其中,An electronic device, characterized by comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; wherein,所述存储器用于存放计算机程序;The memory is used to store computer programs;所述处理器用于执行所述存储器上存放的所述计算机程序时,实现如权利要求1-7中任一项所述的方法。When the processor is used to execute the computer program stored in the memory, the method according to any one of claims 1-7 is implemented.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被执行时实现权利要求1-7中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed, the method according to any one of claims 1-7 is realized.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910107568.6A CN109753975B (en) | 2019-02-02 | 2019-02-02 | Training sample obtaining method and device, electronic equipment and storage medium |
CN201910107568.6 | 2019-02-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020156361A1 true WO2020156361A1 (en) | 2020-08-06 |
Family
ID=66407340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/073396 WO2020156361A1 (en) | 2019-02-02 | 2020-01-21 | Training sample obtaining method and apparatus, electronic device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109753975B (en) |
WO (1) | WO2020156361A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233171A (en) * | 2020-09-03 | 2021-01-15 | 上海眼控科技股份有限公司 | Target labeling quality inspection method and device, computer equipment and storage medium |
CN112257659A (en) * | 2020-11-11 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | Detection tracking method, apparatus and medium |
CN112801940A (en) * | 2020-12-31 | 2021-05-14 | 深圳市联影高端医疗装备创新研究院 | Model evaluation method, device, equipment and medium |
CN113254703A (en) * | 2021-05-12 | 2021-08-13 | 北京百度网讯科技有限公司 | Video matching method, video processing device, electronic equipment and medium |
CN114347030A (en) * | 2022-01-13 | 2022-04-15 | 中通服创立信息科技有限责任公司 | Robot vision following method and vision following robot |
CN115620210A (en) * | 2022-11-29 | 2023-01-17 | 广东祥利科技有限公司 | Method and system for determining performance of electronic wire based on image processing |
CN115499666B (en) * | 2022-11-18 | 2023-03-24 | 腾讯科技(深圳)有限公司 | Video compression method, video decompression method, video compression device, video decompression device, and storage medium |
CN117237418A (en) * | 2023-11-15 | 2023-12-15 | 成都航空职业技术学院 | Moving object detection method and system based on deep learning |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109753975B (en) * | 2019-02-02 | 2021-03-09 | 杭州睿琪软件有限公司 | Training sample obtaining method and device, electronic equipment and storage medium |
CN110503074B (en) * | 2019-08-29 | 2022-04-15 | 腾讯科技(深圳)有限公司 | Information labeling method, device and equipment of video frame and storage medium |
CN110796041B (en) * | 2019-10-16 | 2023-08-18 | Oppo广东移动通信有限公司 | Principal identification method and apparatus, electronic device, and computer-readable storage medium |
CN110796098B (en) * | 2019-10-31 | 2021-07-27 | 广州市网星信息技术有限公司 | Method, device, equipment and storage medium for training and auditing content auditing model |
CN110826509A (en) * | 2019-11-12 | 2020-02-21 | 云南农业大学 | Grassland fence information extraction system and method based on high-resolution remote sensing image |
CN111191708A (en) * | 2019-12-25 | 2020-05-22 | 浙江省北大信息技术高等研究院 | Automatic sample key point marking method, device and system |
CN111428589B (en) * | 2020-03-11 | 2023-05-30 | 新华智云科技有限公司 | Gradual transition identification method and system |
CN111497847B (en) * | 2020-04-23 | 2021-11-16 | 江苏黑麦数据科技有限公司 | Vehicle control method and device |
CN112307908B (en) * | 2020-10-15 | 2022-07-26 | 武汉科技大学城市学院 | Video semantic extraction method and device |
CN112784750B (en) * | 2021-01-22 | 2022-08-09 | 清华大学 | Fast video object segmentation method and device based on pixel and region feature matching |
CN113225461A (en) * | 2021-02-04 | 2021-08-06 | 江西方兴科技有限公司 | System and method for detecting video monitoring scene switching |
CN115482426A (en) * | 2021-06-16 | 2022-12-16 | 华为云计算技术有限公司 | Video annotation method, device, computing equipment and computer-readable storage medium |
CN113378958A (en) * | 2021-06-24 | 2021-09-10 | 北京百度网讯科技有限公司 | Automatic labeling method, device, equipment, storage medium and computer program product |
CN113610030A (en) * | 2021-08-13 | 2021-11-05 | 北京地平线信息技术有限公司 | Behavior recognition method and behavior recognition device |
CN113762286A (en) * | 2021-09-16 | 2021-12-07 | 平安国际智慧城市科技股份有限公司 | Data model training method, device, equipment and medium |
CN114697702B (en) * | 2022-03-23 | 2024-01-30 | 咪咕文化科技有限公司 | Audio and video marking method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100202660A1 (en) * | 2005-12-29 | 2010-08-12 | Industrial Technology Research Institute | Object tracking systems and methods |
CN107886105A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | A kind of annotation equipment of image |
CN107886104A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | A kind of mask method of image |
CN109753975A (en) * | 2019-02-02 | 2019-05-14 | 杭州睿琪软件有限公司 | Training sample obtaining method and device, electronic equipment and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218603B (en) * | 2013-04-03 | 2016-06-01 | 哈尔滨工业大学深圳研究生院 | A kind of face automatic marking method and system |
CN103559237B (en) * | 2013-10-25 | 2017-02-15 | 南京大学 | Semi-automatic image annotation sample generating method based on target tracking |
CN103970906B (en) * | 2014-05-27 | 2017-07-04 | 百度在线网络技术(北京)有限公司 | The method for building up and device of video tab, the display methods of video content and device |
CN108229285B (en) * | 2017-05-27 | 2021-04-23 | 北京市商汤科技开发有限公司 | Object classification method, object classifier training method and device and electronic equipment |
CN108520218A (en) * | 2018-03-29 | 2018-09-11 | 深圳市芯汉感知技术有限公司 | A kind of naval vessel sample collection method based on target tracking algorism |
CN108596958B (en) * | 2018-05-10 | 2021-06-04 | 安徽大学 | Target tracking method based on difficult positive sample generation |
CN108986134B (en) * | 2018-08-17 | 2021-06-18 | 浙江捷尚视觉科技股份有限公司 | Video target semi-automatic labeling method based on related filtering tracking |
-
2019
- 2019-02-02 CN CN201910107568.6A patent/CN109753975B/en active Active
-
2020
- 2020-01-21 WO PCT/CN2020/073396 patent/WO2020156361A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100202660A1 (en) * | 2005-12-29 | 2010-08-12 | Industrial Technology Research Institute | Object tracking systems and methods |
CN107886105A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | A kind of annotation equipment of image |
CN107886104A (en) * | 2016-09-30 | 2018-04-06 | 法乐第(北京)网络科技有限公司 | A kind of mask method of image |
CN109753975A (en) * | 2019-02-02 | 2019-05-14 | 杭州睿琪软件有限公司 | Training sample obtaining method and device, electronic equipment and storage medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233171A (en) * | 2020-09-03 | 2021-01-15 | 上海眼控科技股份有限公司 | Target labeling quality inspection method and device, computer equipment and storage medium |
CN112257659A (en) * | 2020-11-11 | 2021-01-22 | 四川云从天府人工智能科技有限公司 | Detection tracking method, apparatus and medium |
CN112257659B (en) * | 2020-11-11 | 2024-04-05 | 四川云从天府人工智能科技有限公司 | Detection tracking method, device and medium |
CN112801940A (en) * | 2020-12-31 | 2021-05-14 | 深圳市联影高端医疗装备创新研究院 | Model evaluation method, device, equipment and medium |
CN113254703A (en) * | 2021-05-12 | 2021-08-13 | 北京百度网讯科技有限公司 | Video matching method, video processing device, electronic equipment and medium |
CN114347030A (en) * | 2022-01-13 | 2022-04-15 | 中通服创立信息科技有限责任公司 | Robot vision following method and vision following robot |
CN115499666B (en) * | 2022-11-18 | 2023-03-24 | 腾讯科技(深圳)有限公司 | Video compression method, video decompression method, video compression device, video decompression device, and storage medium |
CN115620210A (en) * | 2022-11-29 | 2023-01-17 | 广东祥利科技有限公司 | Method and system for determining performance of electronic wire based on image processing |
CN115620210B (en) * | 2022-11-29 | 2023-03-21 | 广东祥利科技有限公司 | Method and system for determining performance of electronic wire material based on image processing |
CN117237418A (en) * | 2023-11-15 | 2023-12-15 | 成都航空职业技术学院 | Moving object detection method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN109753975A (en) | 2019-05-14 |
CN109753975B (en) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020156361A1 (en) | Training sample obtaining method and apparatus, electronic device and storage medium | |
CN110400332B (en) | Target detection tracking method and device and computer equipment | |
CN105512683B (en) | Object localization method and device based on convolutional neural networks | |
Lu et al. | Robust and efficient saliency modeling from image co-occurrence histograms | |
CN110807473B (en) | Target detection method, device and computer storage medium | |
CN110334762B (en) | Feature matching method based on quad tree combined with ORB and SIFT | |
US20160307057A1 (en) | Fully Automatic Tattoo Image Processing And Retrieval | |
WO2019071976A1 (en) | Panoramic image saliency detection method based on regional growth and eye movement model | |
Thalji et al. | Iris Recognition using robust algorithm for eyelid, eyelash and shadow avoiding | |
KR20190082593A (en) | System and Method for Reidentificating Object in Image Processing | |
Jung et al. | Eye detection under varying illumination using the retinex theory | |
Meher et al. | Efficient method of moving shadow detection and vehicle classification | |
Song et al. | Feature extraction and target recognition of moving image sequences | |
CN108765463B (en) | Moving target detection method combining region extraction and improved textural features | |
Yaru et al. | Algorithm of fingerprint extraction and implementation based on OpenCV | |
Elashry et al. | Feature matching enhancement using the graph neural network (gnn-ransac) | |
CN111768436B (en) | Improved image feature block registration method based on fast-RCNN | |
Yan et al. | Saliency detection based on superpixel correlation and cosine window filtering | |
CN114119952A (en) | Image matching method and device based on edge information | |
Dey et al. | An efficient approach for pupil detection in iris images | |
Zhang et al. | RGB-D saliency detection with multi-feature-fused optimization | |
Kerdvibulvech | Hybrid model of human hand motion for cybernetics application | |
Zhang et al. | Oil tank detection based on linear clustering saliency analysis for synthetic aperture radar images | |
Wang et al. | Image saliency detection for multiple objects | |
Makandar et al. | Comparison and Analysis of Different Feature Extraction Methods versus Noisy Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20749758 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20749758 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20749758 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20749758 Country of ref document: EP Kind code of ref document: A1 |