CN114565803A

CN114565803A - Method, device and mechanical equipment for extracting difficult sample

Info

Publication number: CN114565803A
Application number: CN202210065428.9A
Authority: CN
Inventors: 付玲; 周志忠; 秦拯; 向超前; 虢彦
Original assignee: Zhongke Yungu Technology Co Ltd
Current assignee: Zhongke Yungu Technology Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-05-31

Abstract

The application discloses a method, a device and mechanical equipment for extracting a difficult sample. The method comprises the following steps: acquiring a candidate key frame; constructing a semantic segmentation model according to the candidate key frames; determining a prediction difficult sample and an edge marking sample through a semantic segmentation model; screening the edge marking sample to obtain a sample difficult to mark; and determining candidate difficult samples according to the prediction difficult samples and the marked difficult samples. According to the method and the device, the difficult prediction sample and the edge mark sample are determined through the semantic segmentation model, the scale of the manual picture to be confirmed is reduced through various screening modes, the identification efficiency of the difficult sample is improved, and the quality of a sample library of the difficult sample is improved.

Description

Method, device and mechanical equipment for extracting difficult sample

Technical Field

The application relates to the technical field of intelligent monitoring, in particular to a method, a device and mechanical equipment for extracting a difficult sample.

Background

The semantic segmentation label making requires that the edge point sets of all the regions of interest in the sample picture can be accurately labeled. However, because the labeling cost is high, a large cost is required for constructing a qualified sample library with a certain scale. At present, the evaluation of high-value samples (difficult samples) is mainly based on labels, and the final loss function or the variation thereof is used as the final quantitative evaluation standard. In the prior art, for the extraction of a difficult sample, relevant rules are defined in the model training process to mine the difficult sample, which can not reduce the marking amount; or the manual observation data is selectively marked, which has certain blindness to the identification of the sample value, can not ensure the quality of the sample library and consumes a great deal of manpower. Therefore, in the prior art, the number of labels cannot be reduced, and the value of a sample cannot be visually evaluated, so that the extraction efficiency of a difficult sample is low.

Disclosure of Invention

The application aims to provide a method, a device and mechanical equipment for extracting a difficult sample, and the method, the device and the mechanical equipment are used for solving the problems that in the prior art, the number of labels cannot be reduced, and the value of the sample cannot be visually evaluated, so that the extraction efficiency of the difficult sample is low.

In order to achieve the above object, a first aspect of the present application provides a method for extracting a difficult sample, comprising:

acquiring a candidate key frame;

constructing a semantic segmentation model according to the candidate key frames;

determining a prediction difficult sample and an edge marking sample through a semantic segmentation model;

screening the edge marking sample to obtain a sample difficult to mark;

and determining candidate difficult samples according to the prediction difficult samples and the marked difficult samples.

In the embodiment of the present application, constructing a semantic segmentation model according to candidate keyframes includes:

dividing the candidate key frames into a plurality of groups of candidate key frames;

selecting a preset group of candidate key frames for marking to obtain an initial sample library;

training a semantic segmentation model according to the initial sample library, wherein the semantic segmentation model is used for predicting residual candidate key frames;

after each set of remaining candidate key frames is predicted, the initial sample library is updated and the semantic segmentation model is retrained to update the semantic segmentation model.

In an embodiment of the present application, after predicting each group of remaining candidate key frames, updating the initial sample base and retraining the semantic segmentation model to update the semantic segmentation model includes:

adding the prediction difficulty samples corresponding to the current group into the updated initial sample library to obtain a current sample library;

retraining the semantic segmentation model according to the current sample library to obtain a current semantic segmentation model; the current semantic segmentation model is used to predict the next set of remaining candidate key frames.

In the embodiment of the present application, determining the prediction difficult samples and the edge marker samples by the semantic segmentation model includes:

determining a difference map of a maximum probability layer and a second maximum probability layer of the current candidate key frame aiming at each candidate key frame;

counting the number of target pixels with the difference smaller than a first threshold value in the difference map;

determining the ratio of the number of target pixels to the total number of pixels of the current candidate key frame;

judging whether the ratio is greater than a second threshold value;

under the condition that the ratio is larger than a second threshold value, judging the current candidate key frame as a prediction difficult sample;

and in the case that the ratio is not greater than the second threshold, judging the current candidate key frame as an edge mark sample.

In the embodiment of the present application, screening the edge-labeled sample to obtain a labeled sample includes:

acquiring a first edge mark sample and a second edge mark sample which are adjacent in sequence in time;

determining the second edge marking sample as a target edge marking sample;

determining the similarity of the target edge mark sample and the first edge mark sample;

judging whether the target edge marking sample is a sample to be artificially marked according to the similarity;

and acquiring a difficult-to-mark sample in the sample to be marked manually.

In the embodiment of the present application, determining whether the target edge marking sample is a sample to be artificially marked according to the similarity includes:

judging whether the similarity is smaller than a third threshold value;

and under the condition that the similarity is smaller than a third threshold value, judging that the target edge marking sample is a sample to be artificially marked.

In an embodiment of the present application, the obtaining of the candidate keyframes includes:

and acquiring candidate key frames containing motion by a three-frame difference method.

A second aspect of the present application provides an apparatus for extracting a difficult sample, comprising:

a memory configured to store instructions; and

a processor configured to call instructions from memory and upon execution of the instructions to enable the method for extracting a hard sample described above.

A third aspect of the present application provides a mechanical apparatus comprising:

the video acquisition device is used for acquiring a moving scene video with a fixed visual angle;

the device for extracting the difficult sample is described above.

A fourth aspect of the present application provides a machine-readable storage medium having stored thereon instructions for causing a machine to perform the above-described method for extracting a hard sample.

By the technical scheme, a semantic segmentation model is constructed according to the acquired candidate key frames; determining a prediction difficult sample and an edge marking sample through a semantic segmentation model; screening the edge marking sample to obtain a sample difficult to mark; and determining candidate difficult samples according to the prediction difficult samples and the marked difficult samples. Compared with the method that the model prediction result is directly utilized and marked on the original image, and then manual confirmation is carried out, the method and the device determine the prediction difficult sample and the edge marking sample through the semantic segmentation model, reduce the scale of the manual to-be-confirmed image through various screening modes, improve the identification efficiency of the difficult sample, and improve the quality of the sample library of the difficult sample.

Additional features and advantages of the present application will be described in detail in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application and not to limit the application. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a method for extracting a difficult sample according to an embodiment of the present application;

FIG. 2 is a block diagram schematically illustrating an apparatus for extracting a difficult sample according to an embodiment of the present application;

fig. 3 schematically shows a schematic structural diagram of a mechanical device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the specific embodiments described herein are only used for illustrating and explaining the embodiments of the present application and are not used for limiting the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that if directional indications (such as up, down, left, right, front, and back … …) are referred to in the embodiments of the present application, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

Fig. 1 schematically shows a flow diagram of a method for extracting a difficult sample according to an embodiment of the present application. As shown in fig. 1, an embodiment of the present application provides a method for extracting a difficult sample, which may include the following steps:

102, acquiring candidate key frames;

104, constructing a semantic segmentation model according to the candidate key frames;

step 106, determining a prediction difficult sample and an edge marking sample through a semantic segmentation model;

108, screening the edge marking samples to obtain marking difficult samples;

and step 110, determining candidate hard samples according to the prediction hard samples and the marked hard samples.

The method for extracting the hard sample can be applied to concrete mechanical equipment, and can include but is not limited to mixing plant mixer truck inlet and outlet alignment video key frame selection and the like. In the embodiments of the present application, videos are composed of still pictures, and these still pictures are referred to as frames. Aiming at the problem of how to mine a semantic segmentation difficult sample under the condition of mass videos with fixed visual angle motion, the embodiment of the application provides an identification method for extracting the semantic segmentation difficult sample. Aiming at the problem of how to quickly screen massive difficult samples (namely edge marking samples) under a fixed visual angle, a quick screening method based on image structure similarity is provided, and the screening efficiency of marking the difficult samples is improved.

In the embodiment of the application, the deep learning technology is an effective technical means for solving a plurality of tasks in natural scenes in the image field. The generalization ability of the deep learning model is derived from the structure of the model and the training skill of the model, but the upper limit of the generalization ability of the model is finally determined to be the sample library used by the model, so that the construction of a high-value sample library is the key for completing a certain image task by adopting the deep learning technology. In the big data era at present, the data volume is big, and data type is many, need label data under the model training general probability condition, but easy sample does not have actual help to model training precision promotion, and its mark occupation ratio is with high costs. Therefore, adding a hard sample mining strategy in the model training process can improve the performance of the model, but cannot reduce the marking amount. Therefore, the research of identifying the difficult samples (namely the difficult samples) from the mass data has important significance for improving the quality of the sample library, improving the performance of the model and reducing the labeling cost. The method for extracting the difficult sample comprises three stages: and primarily selecting candidate key frames, determining a prediction difficult sample and an edge marking sample through a semantic segmentation model, and screening the edge marking sample to obtain a marking difficult sample, so that the candidate difficult sample is determined according to the prediction difficult sample and the marking difficult sample.

In the embodiment of the application, the preliminary selection of the candidate key frames is to preliminarily screen out the motion frames from the massive videos and preliminarily separate out the background frames and the foreground frames. In one example, candidate keyframes containing motion may be obtained by a three-frame difference method. Specifically, the method for acquiring a candidate key frame containing motion by a three-frame difference method may include: the method comprises the steps of obtaining a first video frame, a second video frame and a third video frame which are adjacent in sequence in time, and determining the third video frame as a target video frame. And performing difference processing on the first video frame and the second video frame to obtain a first adjacent difference image, and simultaneously performing difference processing on the second video frame and the target video frame to obtain a second adjacent difference image. Further, determining the similarity between the target video frame and the second video frame, judging whether the target video frame contains motion according to the first adjacent difference image and the second adjacent difference image, and determining the target video frame as a foreground key frame under the condition that the target video frame contains motion and the similarity is smaller than a first set value; and under the condition that the target video frame does not contain motion and the similarity is smaller than a second set value, determining the target video frame as a background key frame. The background key frames are key frames that do not include motion, and the foreground key frames are key frames that include motion. What needs to be acquired in the embodiment of the present application is a foreground key frame containing motion. The embodiment of the application utilizes a three-frame difference method to determine the motion frame, mainly considering the balance between the calculation efficiency and the motion perception, extracting three video frames which are sequentially adjacent in time, and sharpening the continuous three-frame image to reduce the influence of uneven illumination on motion detection. Whether the target video frame contains motion and the similarity between the target video frame and the adjacent video frame are further judged, so that the foreground key frame and the background key frame can be distinguished while the frame is extracted, subsequent distinguishing by spending a large amount of manpower is not needed, and the efficiency and the quality of selecting the key frame are improved.

In the embodiment of the present application, semantic segmentation is a basic task in image segmentation, and means that each pixel is labeled with a corresponding category for an image, and individuals are not distinguished. Simply put, the data input by the visual image is divided into different semantically interpretable categories. After the processor acquires the candidate key frame, a semantic segmentation model can be constructed according to the candidate key frame, and a prediction difficult sample and an edge marking sample are determined through the semantic segmentation model. The prediction of the difficult samples refers to the difficult samples which can be directly determined through a semantic segmentation model; edge-marked samples refer to samples that are to be marked manually.

In an embodiment of the present application, the building of the semantic segmentation model may include the following steps: dividing the candidate key frames into a plurality of groups (such as m groups), selecting a plurality of groups (such as a preset group) of candidate key frames for labeling to obtain an initial sample library, and training a semantic segmentation model by using labeled data of the initial sample library. And identifying residual candidate key frames by utilizing the semantic segmentation model trained according to the initial sample library, wherein the residual candidate key frames are other candidate key frames except the candidate key frames contained in the preset group. For example, one of the remaining sets is identified using the semantic segmentation model; and adding the residual candidate key frames identified as the difficult-to-predict samples into the initial sample library, retraining the semantic segmentation model, and repeating the steps until all the residual candidate key frames are identified by the continuously updated semantic segmentation model. Taking 10000 candidate key frames as an example, m is 100, and the 10000 candidate key frames are divided into 100 groups, and each group comprises 100 candidate key frames. And selecting 1 group for labeling to obtain an initial semantic segmentation model, and assuming that 50 hard samples in the group exist, 50 hard samples in the initial sample library exist. And predicting 1 group in the remaining 99 groups by using the constructed initial semantic segmentation model to obtain a prediction difficult sample of the group. If 30 prediction difficult samples are obtained in the group, the 30 prediction difficult samples are added into the initial sample bank, and the number of the difficult samples in the initial sample bank is increased to 80. And retraining the semantic segmentation model according to the updated initial sample library, and further continuously predicting a group of candidate key frames in the remaining 98 groups by using the secondarily trained semantic segmentation model. And sequentially updating the initial sample base and retraining the semantic segmentation model according to the mode until all the candidate key frames are predicted.

In the embodiment of the application, for each prediction of the semantic segmentation model, the evaluation standard of the semantic segmentation difficult sample can be perfected according to the single pixel prediction confidence, and a corresponding index operator is proposed. The quality of the semantic segmentation result can be evaluated from two aspects: the model predicts the confidence distribution of individual pixels and the deviation of the model prediction from the actual result. In one example, the processor may construct a prediction probability difference map by determining a difference map of a most probable layer and a second most probable layer of the current candidate keyframe, counting a target number of pixels in the difference map that are less than a first threshold thresh1, and determining a ratio of the target number of pixels to a total number of pixels of the current candidate keyframe. And judging whether the current candidate key frame is a prediction difficult sample or an edge mark sample by judging whether the ratio is larger than a second threshold thresh 2. Specifically, in the case that the ratio is greater than the second threshold thresh2, determining that the current candidate key frame is a prediction difficult sample; in the case that the ratio is not greater than the second threshold thresh2, the current candidate key frame is determined to be an edge mark sample, and further screening needs to be performed.

In the embodiment of the present application, the edge marker sample is a sample to be manually confirmed, but if the number of video frames of the edge marker sample determined according to the semantic segmentation model is large, the video frames need to be preliminarily screened by calculating an image Structure Similarity (SSIM) algorithm of adjacent video frames. If the edge marker samples after screening are still many, the original images of the rest images can be further screened by adopting SSIM, and meanwhile, corresponding marker maps are screened out. After the screening process, the number of the images to be manually confirmed is greatly reduced, so that the manual screening efficiency is improved. In one example, the processor may obtain a first edge marker sample and a second edge marker sample that are adjacent in time in sequence, and determine the second edge marker sample as a target edge marker sample, the first edge marker sample being a previous picture of the target edge marker sample that is adjacent in time. Defaulting a first edge mark sample closest to the front in time as a sample to be artificially confirmed, determining the similarity between a target edge mark sample and a previous adjacent edge mark sample, and judging whether the target edge mark sample is the sample to be artificially marked according to the similarity. If the similarity is small, the difference between the current target edge marking sample and the previous target edge marking sample is large, and manual marking is needed. For example, whether the similarity is smaller than a third threshold thresh3 is determined, and if the similarity is smaller than a third threshold thresh3, the target edge mark sample is determined to be a sample to be artificially marked. Therefore, the number of artificial marks can be reduced, the scale of the artificial image to be confirmed is reduced, the efficiency of artificial screening is improved, and the difficult-to-mark sample in the sample to be artificially marked is obtained with higher efficiency. Further, a prediction hard sample determined according to the semantic segmentation model and a marked hard sample determined by artificial marking are determined as candidate hard samples.

In this embodiment of the present application, the step 104 of constructing a semantic segmentation model according to the candidate keyframes may include:

Specifically, semantic segmentation is a basic task in image segmentation, and means that each pixel is labeled with a corresponding category for an image, and individuals are not distinguished. Simply put, the data input by the visual image is divided into different semantically interpretable categories. After the processor acquires the candidate key frame, a semantic segmentation model can be constructed according to the candidate key frame, and a prediction difficult sample and an edge marking sample are determined through the semantic segmentation model. The prediction of the difficult samples refers to the difficult samples which can be directly determined through a semantic segmentation model; edge-marked samples refer to samples that are to be marked manually.

In an embodiment of the present application, the building of the semantic segmentation model may include the following steps: dividing the candidate key frames into a plurality of groups (such as m groups), selecting a plurality of groups (such as a preset group) of candidate key frames for labeling to obtain an initial sample library, and training a semantic segmentation model by using labeled data of the initial sample library. And identifying residual candidate key frames by utilizing the semantic segmentation model trained according to the initial sample library, wherein the residual candidate key frames are other candidate key frames except the candidate key frames contained in the preset group. For example, one of the remaining sets is identified using the semantic segmentation model; and adding the residual candidate key frames identified as the difficult-to-predict samples into the initial sample library, retraining the semantic segmentation model, and repeating the steps until all the residual candidate key frames are identified by the continuously updated semantic segmentation model. By updating the initial sample library and the semantic segmentation model after each group of predictions are completed, the accuracy of the semantic segmentation model can be continuously improved, and the performance of the model can be improved.

In an embodiment of the present application, after predicting each set of remaining candidate key frames, updating the initial sample base and retraining the semantic segmentation model to update the semantic segmentation model may include:

Specifically, after each group of residual candidate key frames is predicted, the residual candidate key frames identified as difficult-to-predict samples are added into the initial sample library, the semantic segmentation model is retrained, and the process is repeated until all the residual candidate key frames are identified through the continuously updated semantic segmentation model. Taking 10000 candidate key frames as an example, where m is 100, the 10000 candidate key frames are divided into 100 groups, and each group includes 100 candidate key frames. And selecting 1 group for labeling to obtain an initial semantic segmentation model, and assuming that 50 hard samples in the group exist, 50 hard samples in the initial sample library exist. And predicting 1 group in the remaining 99 groups by using the constructed initial semantic segmentation model to obtain a prediction difficult sample of the group. If 30 prediction difficult samples are obtained in the group, the 30 prediction difficult samples are added into the initial sample bank, and the number of the difficult samples in the initial sample bank is increased to 80. And retraining the semantic segmentation model according to the updated initial sample library, and further continuously predicting a group of candidate key frames in the remaining 98 groups by using the secondarily trained semantic segmentation model. And sequentially updating the initial sample base and retraining the semantic segmentation model according to the mode until all the candidate key frames are predicted. Therefore, the accuracy of the semantic segmentation model can be continuously improved, and the performance of the model is improved.

In this embodiment of the present application, the step 106 of determining the prediction difficult samples and the edge labeled samples by the semantic segmentation model includes:

judging whether the ratio is greater than a second threshold value;

Specifically, for each prediction of the semantic segmentation model, the evaluation standard of the semantic segmentation difficult sample can be perfected according to the single pixel prediction confidence, and a corresponding index operator is proposed. The quality of the semantic segmentation result can be evaluated from two aspects: the model predicts the confidence distribution of individual pixels and the deviation of the model prediction from the actual result. In one example, the processor may construct a prediction probability difference map by determining a difference map of a most probable layer and a second most probable layer of the current candidate keyframe, counting a target number of pixels in the difference map that are less than a first threshold thresh1, and determining a ratio of the target number of pixels to a total number of pixels of the current candidate keyframe. And judging whether the current candidate key frame is a prediction difficult sample or an edge mark sample by judging whether the ratio is larger than a second threshold thresh 2. Specifically, in the case that the ratio is greater than a second threshold thresh2, determining the current candidate key frame as a prediction difficult sample; in the case that the ratio is not greater than the second threshold thresh2, the current candidate key frame is determined to be an edge mark sample, and further screening needs to be performed. The prediction difficulty sample is determined through two dimensions, so that the quality of the prediction difficulty sample can be improved, and the value of the prediction difficulty sample can be intuitively evaluated.

In this embodiment, the step 108 of screening the edge-labeled sample to obtain a labeled sample may include:

determining the second edge marking sample as a target edge marking sample;

and acquiring a difficult-to-mark sample in the sample to be marked manually.

In particular, the edge-marked sample is the sample to be confirmed manually. If the number of video frames of the edge marker sample determined according to the semantic segmentation model is large, the video frames need to be preliminarily screened by calculating an image Structure Similarity (SSIM) algorithm of adjacent video frames. If the edge marker samples after screening are still many, the original images of the rest images can be further screened by adopting SSIM, and meanwhile, corresponding marker maps are screened out. After the screening process, the number of the images to be manually confirmed is greatly reduced, so that the manual screening efficiency is improved.

In this embodiment, the processor may obtain a first edge marker sample and a second edge marker sample which are adjacent in sequence in time, and determine the second edge marker sample as a target edge marker sample, where the first edge marker sample is a previous picture of the target edge marker sample which is adjacent in time. Defaulting a first edge mark sample closest to the front in time as a sample to be artificially confirmed, determining the similarity between a target edge mark sample and a previous adjacent edge mark sample, and judging whether the target edge mark sample is the sample to be artificially marked according to the similarity. Therefore, the number of artificial marks can be reduced, the scale of the artificial image to be confirmed is reduced, the efficiency of artificial screening is improved, and the difficult-to-mark sample in the sample to be artificially marked is obtained with higher efficiency.

In this embodiment of the present application, determining whether the target edge marking sample is a sample to be artificially marked according to the similarity may include:

judging whether the similarity is smaller than a third threshold value;

Specifically, whether the target edge marking sample is a sample to be artificially marked or not can be judged through the SSIM. If the similarity is small, the difference between the current target edge marking sample and the previous target edge marking sample is large, and manual marking is needed. For example, whether the similarity is smaller than a third threshold thresh3 is determined, and if the similarity is smaller than a third threshold thresh3, the target edge mark sample is determined to be a sample to be artificially marked. Through similarity judgment, the number of samples to be artificially marked can be reduced.

In an embodiment of the present invention, the similarity may satisfy the following formula:

c₁＝(k₁L)²；

c₂＝(k₂L)²；

wherein SSIM (x, y) is the similarity between the target edge marker sample and the first edge marker sample; x and y are the target edge marker sample and the first edge marker sample, respectively; mu.s_xAnd mu_yAveraging the image gray level matrices of the target edge marker sample and the first edge marker sample, respectively; sigma_x ²And σ_y ²Variance values of image gray level matrixes of the target edge marking sample and the first edge marking sample are respectively obtained; sigma_xyCovariance of the image gray matrix for the target edge marker sample and the first edge marker sample; c. C₁And c₂Is a constant used to maintain stability; l is the dynamic range of the pixel value; k is a radical of₁＝0.01；k₂＝0.03。

In this embodiment of the present application, the step 102 of obtaining candidate key frames may include:

The preliminary selection of the candidate key frames is to preliminarily screen out the motion frames from the massive videos and preliminarily separate out background frames and foreground frames. In the embodiment of the present application, candidate keyframes containing motion may be obtained by a three-frame difference method. Specifically, the method for obtaining candidate keyframes containing motion by a three-frame difference method may include: the method comprises the steps of obtaining a first video frame, a second video frame and a third video frame which are adjacent in sequence in time, and determining the third video frame as a target video frame. And performing difference processing on the first video frame and the second video frame to obtain a first adjacent difference image, and simultaneously performing difference processing on the second video frame and the target video frame to obtain a second adjacent difference image. Further, determining the similarity between the target video frame and the second video frame, judging whether the target video frame contains motion according to the first adjacent difference image and the second adjacent difference image, and determining the target video frame as a foreground key frame under the condition that the target video frame contains motion and the similarity is smaller than a first set value; and under the condition that the target video frame does not contain motion and the similarity is smaller than a second set value, determining the target video frame as a background key frame. The background key frames are key frames that do not include motion, and the foreground key frames are key frames that include motion. What needs to be acquired in the embodiment of the present application is a foreground key frame containing motion. The embodiment of the application utilizes a three-frame difference method to determine the motion frame, mainly considering the balance between the calculation efficiency and the motion perception, extracting three video frames which are sequentially adjacent in time, and sharpening the continuous three-frame image to reduce the influence of uneven illumination on motion detection. Whether the target video frame contains motion and the similarity between the target video frame and the adjacent video frame are further judged, so that the foreground key frame and the background key frame can be distinguished while the frame is extracted, subsequent distinguishing by spending a large amount of manpower is not needed, and the efficiency and the quality of selecting the key frame are improved.

Fig. 2 schematically shows a block diagram of an apparatus for extracting a difficult sample according to an embodiment of the present application. As shown in fig. 2, an embodiment of the present application provides an apparatus for extracting a difficult sample, which may include:

a memory 210 configured to store instructions; and

a processor 220 configured to call instructions from the memory 210 and when executing the instructions, to implement the method for extracting a difficult sample described above.

Specifically, in the present embodiment, the processor 220 may be configured to:

acquiring a candidate key frame;

screening the edge marking sample to obtain a sample difficult to mark;

Further, the processor 220 may be further configured to:

the construction of the semantic segmentation model according to the candidate key frames comprises the following steps:

Further, the processor 220 may be further configured to:

after each set of remaining candidate key frames is predicted, updating the initial sample library and retraining the semantic segmentation model to update the semantic segmentation model comprises:

Further, the processor 220 may be further configured to:

determining the prediction hard sample and the edge marking sample through the semantic segmentation model comprises the following steps:

judging whether the ratio is greater than a second threshold value;

Further, the processor 220 may be further configured to:

screening the edge-labeled samples to obtain labeled difficult samples comprises:

acquiring a first edge mark sample and a second edge mark sample which are adjacent in time;

determining the second edge marking sample as a target edge marking sample;

and acquiring a difficult-to-mark sample in the sample to be marked manually.

Further, the processor 220 may be further configured to:

judging whether the target edge marking sample is a sample to be artificially marked according to the similarity comprises the following steps:

judging whether the similarity is smaller than a third threshold value;

Further, the processor 220 may be further configured to:

acquiring the candidate key frame comprises:

By the technical scheme, a semantic segmentation model is constructed according to the acquired candidate key frames; determining a prediction difficult sample and an edge marking sample through a semantic segmentation model; screening the edge marking sample to obtain a sample difficult to mark; and determining candidate difficult samples according to the prediction difficult samples and the marked difficult samples. Compared with the method that the model prediction result is directly utilized, the model prediction result is marked on the original image, and then manual confirmation is carried out, the method and the device determine the prediction difficult sample and the edge marking sample through the semantic segmentation model, reduce the scale of the manual to-be-confirmed image through various screening modes, improve the identification efficiency of the difficult sample, and improve the quality of a sample library of the difficult sample.

Fig. 3 schematically shows a schematic structural diagram of a mechanical device according to an embodiment of the present application. As shown in fig. 3, an embodiment of the present application further provides a mechanical apparatus, which may include:

the video acquisition device 310 is used for acquiring a moving scene video with a fixed view angle;

the apparatus 320 for extracting a difficult sample is described above.

In the embodiment of the present invention, the video acquisition module 310 is electrically connected to the device 320 for extracting a hard sample, the video acquisition module 310 acquires a moving scene video with a fixed view angle, and transmits the video to the device 320 for extracting a hard sample, and the device acquires a candidate key frame; constructing a semantic segmentation model according to the candidate key frames; determining a prediction difficult sample and an edge marking sample through a semantic segmentation model; screening the edge marking sample to obtain a sample difficult to mark; and determining candidate difficult samples according to the prediction difficult samples and the marked difficult samples. Therefore, aiming at the problem of selecting difficult samples of massive videos in a moving scene with a fixed visual angle, the semantic segmentation model is used for determining prediction difficult samples and edge marking samples, and the scale of the manual to-be-confirmed picture is reduced, the identification efficiency of the difficult samples is improved, and the quality of a sample library of the difficult samples is improved.

Embodiments of the present application further provide a machine-readable storage medium having instructions stored thereon for causing a machine to perform the above-described method for extracting a hard sample.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for extracting a difficult sample, comprising:

acquiring a candidate key frame;

determining a prediction difficult sample and an edge marking sample through the semantic segmentation model;

screening the edge marking sample to obtain a marking difficult sample;

and determining candidate hard samples according to the prediction hard samples and the marked hard samples.

2. The method of claim 1, wherein the building a semantic segmentation model from the candidate keyframes comprises:

dividing the candidate keyframes into groups of candidate keyframes;

training the semantic segmentation model according to the initial sample library, wherein the semantic segmentation model is used for predicting residual candidate key frames;

3. The method of claim 2, wherein updating the initial sample base and retraining the semantic segmentation model after each prediction of a set of remaining candidate key frames to update the semantic segmentation model comprises:

4. The method of claim 1, wherein the determining, by the semantic segmentation model, prediction hard samples and edge marker samples comprises:

counting the number of target pixels with the difference values smaller than a first threshold value in the difference value graph;

determining the ratio of the number of the target pixels to the total number of pixels of the current candidate key frame;

judging whether the ratio is larger than a second threshold value;

under the condition that the ratio is larger than the second threshold, judging that the current candidate key frame is a prediction difficult sample;

and in the case that the ratio is not greater than the second threshold, determining that the current candidate key frame is an edge mark sample.

5. The method of claim 1, wherein the screening the edge-labeled samples for labeled difficult samples comprises:

determining the second edge marker sample as a target edge marker sample;

determining a similarity of the target edge marker sample and the first edge marker sample;

and acquiring a difficult-to-mark sample in the sample to be artificially marked.

6. The method according to claim 5, wherein the determining whether the target edge marking sample is a sample to be artificially marked according to the similarity comprises:

judging whether the similarity is smaller than a third threshold value;

and under the condition that the similarity is smaller than a third threshold value, judging the target edge mark sample as a sample to be artificially marked.

7. The method of claim 1, wherein the obtaining the candidate keyframes comprises:

8. An apparatus for extracting a difficult sample, comprising:

a memory configured to store instructions; and

a processor configured to invoke the instructions from the memory and to enable the method for extracting a difficult sample according to any one of claims 1 to 7 when executing the instructions.

9. A mechanical device, comprising:

the apparatus for extracting difficult samples according to claim 8.

10. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the method for extracting a difficult sample according to any one of claims 1 to 7.