CN109977262B - Method and device for acquiring candidate segments from video and processing equipment - Google Patents

Method and device for acquiring candidate segments from video and processing equipment Download PDF

Info

Publication number
CN109977262B
CN109977262B CN201910231596.9A CN201910231596A CN109977262B CN 109977262 B CN109977262 B CN 109977262B CN 201910231596 A CN201910231596 A CN 201910231596A CN 109977262 B CN109977262 B CN 109977262B
Authority
CN
China
Prior art keywords
video
similarity
candidate
segment
candidate segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910231596.9A
Other languages
Chinese (zh)
Other versions
CN109977262A (en
Inventor
卢江虎
姚聪
刘小龙
孙宇超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuangshi Technology Co Ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Priority to CN201910231596.9A priority Critical patent/CN109977262B/en
Publication of CN109977262A publication Critical patent/CN109977262A/en
Application granted granted Critical
Publication of CN109977262B publication Critical patent/CN109977262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Abstract

The invention provides a method, a device and processing equipment for acquiring candidate segments from a video, which relate to the technical field of motion detection, and the method comprises the following steps: acquiring a video to be detected; respectively calculating the image similarity between adjacent video frames of the video to be detected by a preset similarity algorithm to obtain a similarity sequence; the sequence of the image similarity in the similarity sequence is the same as the sequence of the video frames; taking the image similarity which is greater than the first segmentation threshold value in the similarity sequence as the target image similarity; and if the arrangement sequence of the similarity of the target images in the similarity sequence is continuous, taking the video frames corresponding to the similarity of the target images as candidate segments of the video to be detected. The method, the device and the processing equipment for acquiring the candidate segment from the video provided by the embodiment of the invention can produce more accurate candidate segments, and the candidate segments have good robustness and are suitable for various video motion detection models.

Description

Method and device for acquiring candidate segments from video and processing equipment
Technical Field
The present invention relates to the field of motion detection technologies, and in particular, to a method, an apparatus, and a processing device for acquiring a candidate segment from a video.
Background
The video action detection means detecting whether a specific target action exists in a target video, and if the specific target action exists in the video, determining the starting time and the ending time of the target action. With the explosive growth of the number of videos, video motion detection is applied in increasingly wide fields including pedestrian surveillance, autopilot, short video segmentation, and the like.
The video motion detection is not ideal due to the large difference in duration of different motions and the wide variety of motions. The existing mainstream video motion detection methods firstly produce segments which may contain motion, and then train a classification network to classify the segments, however, the following problems exist: if the similarity between the background and the foreground of the video is high, the distinguishing capability of extracting the features is not strong, so that the positioning of the action boundary is not accurate; the general capability of the classification network is poor, generally, the data set is fitted forcibly, the classification precision of other data sets is poor, and parameters need to be adjusted again.
In view of the above problems of video motion detection in the prior art, no effective solution has been proposed at present.
Disclosure of Invention
In view of this, an object of the present invention is to provide a method, an apparatus and a processing device for obtaining candidate segments from a video, which can produce more accurate candidate segments, have good robustness, and are suitable for various video motion detection models.
In a first aspect, an embodiment of the present invention provides a method for acquiring a candidate segment from a video, including: acquiring a video to be detected; respectively calculating the image similarity between adjacent video frames of the video to be detected by a preset similarity algorithm to obtain a similarity sequence; wherein the ordering of image similarities in the sequence of similarities is the same as the ordering of the video frames; taking the image similarity which is greater than a first segmentation threshold value in the similarity sequence as a target image similarity; and if the arrangement sequence of the target image similarities in the similarity sequence is continuous, taking the video frames corresponding to the target image similarities as the candidate segments of the video to be detected.
Further, the step of using the video frames corresponding to the similarity of the plurality of target images as the candidate segments of the video to be detected includes: taking the first video frame corresponding to the similarity of the target images as a starting frame of the candidate segment, and taking the last video frame corresponding to the similarity of the target images as an ending frame of the candidate segment; and segmenting the segment between the starting frame and the ending frame from the video to be detected to obtain a candidate segment.
Further, the image similarity in the similarity sequence is provided with index identification; if the arrangement sequence of the target image similarities in the similarity sequence is continuous, the step of taking the video frames corresponding to the target image similarities as the candidate segments of the video to be detected comprises the following steps: judging whether the index marks adjacent to the image similarity are continuous or not; if so, judging whether the continuous index identification is larger than a preset quantity threshold value;
and if the number of the video frames is larger than the preset number threshold, taking the video frames corresponding to the continuous index marks as candidate segments of the video to be detected.
Further, after obtaining the candidate segment, the method further comprises: taking the image similarity which is greater than a second segmentation threshold value in the similarity sequence corresponding to the candidate segment as the detail image similarity; the second segmentation threshold is greater than the first segmentation threshold; if the arrangement sequence of the multiple sub-image similarities in the similarity sequence is continuous, taking the video frames corresponding to the multiple sub-image similarities as the first type sub-divided candidate segments of the candidate segments; and taking other segments of the candidate segments divided by the subdivision candidate segments as second-class subdivided candidate segments.
Further, the step of using the video frames corresponding to the similarity of the plurality of subdivided images as the candidate segments of the first type subdivision of the candidate segments includes: and taking the first video frame corresponding to the similarity of the plurality of subdivided images as a starting frame of the subdivided candidate segment, taking the last video frame corresponding to the similarity of the plurality of subdivided images as an ending frame of the subdivided candidate segment, and segmenting the candidate segment to obtain the subdivided candidate segment.
Further, after obtaining the subdivided candidate segments, the method further comprises: selecting one of said subdivided candidate segments among adjacent said candidate segments, respectively; and taking the first video frame of the previous subdivided candidate segment as the starting frame of the lengthened candidate segment, taking the last video frame of the subsequent subdivided candidate segment as the ending frame of the lengthened candidate segment, and segmenting the video to be detected to obtain the lengthened candidate segment.
Further, the method further comprises: setting a sequencing loss function based on the overlapping degree of the two candidate segments and the correct marked segment; the overlapping degree of the two candidate segments and the correct labeling segment is different; and taking the sequencing loss function as a loss function of a video motion detection model, and training the video motion detection model through the candidate segments.
Further, the method further comprises: and performing motion detection on the candidate segments through a pre-configured video motion detection model.
In a second aspect, an embodiment of the present invention provides an apparatus for acquiring a candidate segment from a video, including: the acquisition module is used for acquiring a video to be detected; the calculation module is used for respectively calculating the image similarity between the adjacent video frames of the video to be detected through a preset similarity calculation method to obtain a similarity sequence; wherein the ordering of image similarities in the sequence of similarities is the same as the ordering of the video frames; the searching module is used for taking the image similarity which is greater than a first segmentation threshold value in the similarity sequence as the target image similarity; and the segmentation module is used for taking the video frames corresponding to the similarity degrees of the target images as the candidate segments of the video to be detected if the arrangement sequence of the similarity degrees of the target images in the similarity sequence is continuous.
In a third aspect, an embodiment of the present invention provides a processing device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method according to any one of the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to perform the steps of the method according to any one of the first aspect.
According to the method, the device and the processing equipment for acquiring the candidate segments from the video, provided by the embodiment of the invention, the image similarity between adjacent video frames of the video to be detected is respectively calculated through a preset similarity calculation method to obtain a similarity sequence, the sequence of the image similarity in the similarity sequence is the same as the sequence of the video frames, and then the video frames which are larger than a first segmentation threshold value and correspond to the continuous image similarity in the similarity sequence are taken as the candidate segments of the video to be detected.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of a processing apparatus according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for obtaining candidate segments from a video according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a process for training a model using a training Loss according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for generating candidate fragments using SSIM sequences according to an embodiment of the present invention;
fig. 5 is a verification result of a video motion detection model according to an embodiment of the present invention;
fig. 6 is a block diagram illustrating a structure of an apparatus for acquiring a candidate segment from a video according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the existing video motion detection method, the following problems exist in the process of producing segments possibly containing motions: 1. the positioning of the action boundaries of the segments is inaccurate; 2. the generalization ability is poor, and the fragments obtained by forced fitting cannot be applied to other data sets. Based on this, embodiments of the present invention provide a method, an apparatus, and a processing device for acquiring a candidate segment from a video, which are described in detail below by embodiments of the present invention.
The first embodiment is as follows:
first, a processing device 100 for implementing embodiments of the present invention, which may be used to execute methods of embodiments of the present invention, is described with reference to fig. 1.
As shown in FIG. 1, processing device 100 includes one or more processors 102, one or more memories 104, input devices 106, output devices 108, and a data collector 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and configuration of the processing device 100 shown in FIG. 1 are exemplary only, and not limiting, and that the processing device may have other components and configurations as desired.
The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific Integrated circuit), the processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capability and/or instruction execution capability, and may control other components in the processing device 100 to perform desired functions.
The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.
The data collector 110 is configured to collect data, where the data collected by the data collector is original data of a current target or target data, and then the data collector may further store the original data or the target data in the memory 104 for use by other components.
Exemplarily, a processing device for implementing the method for acquiring candidate segments from a video according to an embodiment of the present invention may be implemented as a smart terminal such as a server, a smart phone, a tablet computer, a computer, or the like.
Example two:
an embodiment of the present invention provides a method for acquiring a candidate segment from a video by using an image processing method, and referring to a flowchart of a method for acquiring a candidate segment from a video shown in fig. 2, the method may be executed by the processing device provided in the foregoing embodiment, and the method may include the following steps:
step S202, a video to be detected is obtained.
The method for acquiring candidate segments from a video according to this embodiment is to extract a plurality of candidate segments (proposals) from a video to be detected, and the video can be further detected based on the candidate segments.
And step S204, respectively calculating the image similarity between adjacent video frames of the video to be detected by a preset similarity algorithm to obtain a similarity sequence. Wherein the image similarity in the similarity sequence is ordered in the same way as the video frames.
The preset similarity algorithm may be used to measure the similarity between two images, in this embodiment, the similarity between two adjacent images in a video is measured, and it may be determined whether the two adjacent images include continuous motion according to the image similarity, so as to perform subsequent video segmentation accordingly. The similarity algorithm may be implemented by mean-square error (MSE), structural similarity (ssim) or peak Signal-to-Noise ratio (psnr), for example. After the image similarity between all the adjacent two frames of images of the video to be detected is obtained through calculation, all the similarities are arranged according to the sequence of the corresponding images appearing in the video, and a similarity sequence can be obtained. The sequence of the image similarity in the finally obtained similarity sequence is the same as the sequence of the corresponding video frames.
And step S206, taking the image similarity which is greater than the first segmentation threshold value in the similarity sequence as the target image similarity.
The similarity greater than the first segmentation threshold indicates that two adjacent frames of images include continuous actions, and the similarity less than the first segmentation threshold indicates that two adjacent frames of images do not include continuous actions, so that the initial image and the ending image of a certain action in the video can be found according to the comparison result of the similarity and the first segmentation threshold.
Step S208, if the arrangement sequence of the target image similarities in the similarity sequence is continuous, the video frames corresponding to the target image similarities are used as candidate segments of the video to be detected.
Because the image similarity represents the image similarity between adjacent video frames, and the arrangement sequence of the target image similarities in the similarity sequence is continuous, it can be determined that the video frames corresponding to the target image similarities contain continuous actions, and the video frames corresponding to the target image similarities need to be segmented, so as to obtain the candidate segments of the video to be detected.
According to the method for acquiring the candidate segments from the video, provided by the embodiment of the invention, the image similarity between adjacent video frames of the video to be detected is respectively calculated through a preset similarity calculation method to obtain a similarity sequence, the sequence of the image similarity in the similarity sequence is the same as the sequence of the video frames, and then the video frames which are greater than a first segmentation threshold value and correspond to continuous image similarities in the similarity sequence are taken as the candidate segments of the video to be detected.
After the similarity sequence is obtained, a plurality of continuous image similarities can be selected from the similarity sequence, and the corresponding video segments are segmented, that is, the candidate segments, wherein the video frames corresponding to the target image similarities are used as the candidate segments of the video to be detected, and the method can be implemented in the following manner: and taking the first video frame corresponding to the similarity of the target images as the start frame of the candidate segment, taking the last video frame corresponding to the similarity of the target images as the end frame of the candidate segment, and segmenting the segment from the start frame to the end frame from the video to be detected to obtain the candidate segment. Since the image similarity refers to the similarity between adjacent frame images, each image similarity corresponds to two images, and therefore, the start frame indicates the previous image of the two images corresponding to the target image similarity, and the end frame indicates the target image similarity corresponding to the next image of the two images.
In order to facilitate the screening of the similarity sequence to obtain the similarity of consecutive images, an index identifier may be set for the similarity of images in the similarity sequence, and the order of the index identifier is also the same as the order of video frames, for example, the order may be the sequence number of frame images. When determining the candidate segment of the video to be detected, it may be determined whether the index identifiers of the similarity of the adjacent images are consecutive, and if the sequence numbers are taken as an example, it may be determined whether the difference value of the sequence numbers of the similarity of the adjacent images is 1. If the index identifications are continuous, whether the continuous index identifications are larger than a preset number threshold value is continuously judged, and the purpose is to eliminate the adverse effect of too few continuous segments on the action detection. And if the number of the video frames is larger than the preset number threshold, taking the video frames corresponding to the continuous index marks as candidate segments of the video to be detected.
In order to obtain candidate segments with more accurate positioning boundaries, each of the obtained candidate segments can be further segmented to generate more detailed segmented segments. The above method may therefore further comprise:
(1) taking the image similarity which is greater than a second segmentation threshold in the similarity sequence corresponding to the candidate segments as the detail image similarity, wherein the second segmentation threshold is greater than the first segmentation threshold;
(2) and if the arrangement sequence of the multiple subdivided image similarities in the similarity sequence is continuous, taking the video frames corresponding to the multiple subdivided image similarities as the first type subdivided candidate segments of the candidate segments. Similar to the foregoing segmentation process, the first video frame corresponding to the multiple subdivided image similarities may be used as a starting frame of the subdivided candidate segment, the last video frame corresponding to the multiple subdivided image similarities may be used as an ending frame of the subdivided candidate segment, and the subdivided candidate segment may be obtained by segmenting the candidate segment. By increasing the segmentation threshold, finer candidate segments can be segmented from the original candidate segments, and finally, the precision of motion detection is improved.
(3) And taking other segments divided by the subdivided candidate segments in the candidate segments as the second type of subdivided candidate segments. In the step (2), the segmented part of the original candidate segment is used as a finer candidate segment, and the original candidate segment further includes at least one remaining other segment, which is also used as a finer candidate segment.
In order to generate candidate segments with different lengths, each of the obtained candidate segments may be further recombined to generate segmented segments with different lengths. The above method may further comprise:
(1) a subdivided candidate segment is selected among the neighboring candidate segments, respectively. And selecting a subdivided candidate segment from the two adjacent candidate segments, wherein the position of the subdivided candidate segment in the corresponding candidate segment is not limited.
(2) And taking the first video frame of the previous subdivided candidate segment as the starting frame of the lengthened candidate segment, taking the last video frame of the subsequent subdivided candidate segment as the ending frame of the lengthened candidate segment, and segmenting the video to be detected to obtain the lengthened candidate segment. When two subdivided candidate segments are connected together, the video frame in between is also comprised of only the lengthened candidate segment. Based on different positions of the subdivided candidate segments in the corresponding candidate segments, the candidate segments with various lengths can be obtained, so that the number of samples for training or detection is enriched.
When the candidate segment is obtained, a video motion detection model may be trained or a pre-configured video motion detection model may be used to perform motion detection on the candidate segment. In the training process, in order to improve the accuracy of the model, the sequence information of the candidate segments in the video to be detected is included, so that the method is different from the candidate segments with different real action segment overlapping degrees. Based on the above idea, the above method further comprises the steps of:
(1) setting a sequencing loss function based on the overlapping degree of the two candidate segments and the correct marked segment; the two candidate segments have different degrees of overlap with the correctly labeled segment. (2) And taking the sequencing loss function as a loss function of the video motion detection model, and training the video motion detection model through the candidate segments.
Most of the existing methods use a cross entropy loss training deep learning model to obtain a video motion detection model, and then classify candidate segments, which ignores relationship information among the candidate segments. For both candidate segments, their scores are higher due to the accuracy of the deep learning model. If in trainingAnd adding the sequence information of the candidate segments in the video, so that the score of a good candidate segment can be higher than that of a poor candidate segment, and the accuracy of the model can be greatly improved. An ordering Loss function (Ranking Loss) can be added on the basis of cross entropy Loss when training the model. Suppose that the overlap of the two candidate segments and the correctly labeled action segment (ground-route) is cp,cqWithout loss of generality, assume cp>cqThen, the rank penalty function can be set at the time of training as follows:
lrank=max(0,cq-cp+ε)
see FIG. 3 for a schematic diagram of a process for training a model using a Ranking Loss model, where ψ1、ψ2、ψ3Are 3 different candidate fragments, C1、C2、C3Respectively represent psi1、ψ2、ψ3Degree of overlap with correctly labeled segments, i.e. candidate segments psi during model training1、ψ2、ψ3Respectively corresponding scores. The goal of model training is if C1、C2、C3Two by two are sequenced, and the corresponding sequence is C1>C2>C3
The following embodiment will be described taking SSIM as an example of video segmentation. There is a strong correlation between adjacent pictures, and the formula is as follows:
Figure BDA0002005989590000111
where x and y represent two images, μxAnd muyIs the mean value, σxAnd σyIs its standard deviation, σxyIs the covariance of the two pictures, C1And C2Is a constant. SSIM compares the brightness, contrast and structural similarity of two pictures. By using the SSIM similarity sequence, rich candidate fragments can be generated by a segmentation strategy and a fusion strategy, as follows:
(1) and (3) a segmentation strategy: a binary vector is generated using the segmentation threshold θ for the SSIM sequence S. And setting the similarity smaller than or equal to the segmentation threshold as a vector 1, and setting the similarity larger than the segmentation threshold as a vector 0, wherein the vector 1 represents the boundary of the candidate segment, and 0 represents the inside of the candidate segment.
Figure BDA0002005989590000112
Figure BDA0002005989590000113
And collecting all indexes of vector 1 for the binary vectors of the similarity to obtain B ═ i, xiNot equal to 0}, wherein xiFrom B (S, θ).
(2) Fusion strategy: connecting the indexes with the vectors being 1 to obtain the candidate segments of the video
Figure BDA0002005989590000114
Wherein xiFrom B, δ is the connectivity, and T is the length of B.
The initial candidate segment phi of the video can be obtained by using the segmentation strategy and the fusion strategyini. For more precise location of boundaries, for ΦiniEach segment in the system continues to perform a segmentation strategy and a fusion strategy, so that more detailed candidate segments phi can be obtaineddet. To yield candidate segments of different lengths, a longer segment Φ can be generated based on all boundary indices in two adjacent candidate segmentscomFinally, all candidate segments may be collected as the final candidate segment of a video, as follows:
ΦV=Φini∪Φdet∪Φcom
see FIG. 4 for a schematic illustration of a process for generating candidate fragments using SSIM sequences, wherein an initial candidate fragment, e.g., x1 0-x2 0、x3 0-x4 0Resulting subdivided candidate segments, e.g. x3 0-x1 1、x1 1-x4 0Resulting in a lengthened candidate segment such as x2 0-x3 0-x1 1、x3 0-x1 1-x4 0
The video motion detection model obtained by training by combining the SSIM sequence and the aforementioned Ranking Loss is shown in fig. 5, and the verification result is far better than that of the existing method, where the first line in the a-diagram and b-diagram of fig. 5 is a correctly labeled candidate segment, the second line is a better candidate segment, and the third line is a poorer candidate segment, and it can be seen from fig. 5 that the scores of the two are greatly different, and the Ranking Loss successfully suppresses the poorer candidate segment.
Example three:
for the image processing method provided in the second embodiment, an embodiment of the present invention provides an apparatus for acquiring a candidate segment from a video, and referring to a block diagram of a structure of an apparatus for acquiring a candidate segment from a video shown in fig. 6, the apparatus includes:
an obtaining module 602, configured to obtain a video to be detected;
the calculating module 604 is configured to calculate image similarities between adjacent video frames of the video to be detected respectively through a preset similarity algorithm, so as to obtain a similarity sequence; the sequence of the image similarity in the similarity sequence is the same as the sequence of the video frames;
the searching module 606 is configured to use the image similarity greater than the first segmentation threshold in the similarity sequence as a target image similarity;
the segmentation module 608 is configured to, if the arrangement order of the multiple target image similarities in the similarity sequence is continuous, take the video frames corresponding to the multiple target image similarities as candidate segments of the video to be detected.
According to the device for acquiring the candidate segment from the video, provided by the embodiment of the invention, the more accurate candidate segment can be generated through the image similarity and the segmentation strategy between the adjacent video frames, and the candidate segment has good robustness and is suitable for various video motion detection models.
In one embodiment, the segmentation module is further configured to: taking a first video frame corresponding to the similarity of the target images as a starting frame of the candidate segment, and taking a last video frame corresponding to the similarity of the target images as an ending frame of the candidate segment; and segmenting a segment from the starting frame to the ending frame from the video to be detected to obtain a candidate segment.
In another embodiment, the image similarity in the similarity sequence is identified with an index; the segmentation module is further configured to: judging whether the index marks of the similarity of the adjacent images are continuous or not; if so, judging whether the continuous index identifiers are larger than a preset number threshold value or not; and if the number of the video frames is larger than the preset number threshold, taking the video frames corresponding to the continuous index marks as candidate segments of the video to be detected.
In one embodiment, the apparatus further comprises a subdividing module configured to: taking the image similarity which is greater than a second segmentation threshold value in the similarity sequence corresponding to the candidate segment as the detail image similarity; the second segmentation threshold is greater than the first segmentation threshold; if the arrangement sequence of the similarity of the multiple subdivided images in the similarity sequence is continuous, taking the video frames corresponding to the similarity of the multiple subdivided images as first-class subdivided candidate segments of the candidate segments; and taking other segments divided by the subdivided candidate segments in the candidate segments as the second type of subdivided candidate segments.
In another embodiment, the subdivision module is further configured to: and taking the first video frame corresponding to the similarity of the plurality of subdivided images as a starting frame of the subdivided candidate segments, taking the last video frame corresponding to the similarity of the plurality of subdivided images as an ending frame of the subdivided candidate segments, and dividing the candidate segments to obtain the subdivided candidate segments.
In one embodiment, the apparatus further comprises an extension module for: respectively selecting a subdivided candidate segment from adjacent candidate segments; and taking the first video frame of the previous subdivided candidate segment as the starting frame of the lengthened candidate segment, taking the last video frame of the subsequent subdivided candidate segment as the ending frame of the lengthened candidate segment, and segmenting the video to be detected to obtain the lengthened candidate segment.
In one embodiment, the apparatus further comprises a training module configured to: setting a sequencing loss function based on the overlapping degree of the two candidate segments and the correct marked segment; the overlapping degree of the two candidate segments and the correct marked segment is different; and taking the sequencing loss function as a loss function of the video motion detection model, and training the video motion detection model through the candidate segments.
In one embodiment, the apparatus further comprises a detection module configured to: and performing motion detection on the candidate segments through a pre-configured video motion detection model.
The device provided by the embodiment has the same implementation principle and technical effect as the foregoing embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiment for the portion of the embodiment of the device that is not mentioned.
Furthermore, the present embodiment provides a processing device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the gesture recognition method provided by the above embodiment is implemented.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing embodiments, and is not described herein again.
Further, the present embodiment provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the method provided by the above-described embodiment.
The method, the apparatus, and the computer program product of the processing device for acquiring a candidate segment from a video according to the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (11)

1. A method for obtaining candidate segments from a video, comprising:
acquiring a video to be detected;
respectively calculating the image similarity between adjacent video frames of the video to be detected by a preset similarity algorithm to obtain a similarity sequence; wherein the ordering of image similarities in the sequence of similarities is the same as the ordering of the video frames;
taking the image similarity which is greater than a first segmentation threshold value in the similarity sequence as a target image similarity;
if the arrangement sequence of the target image similarities in the similarity sequence is continuous, taking the video frames corresponding to the target image similarities as candidate segments of the video to be detected;
setting a sequencing loss function based on the overlapping degree of the two candidate segments and the correct marked segment; the overlapping degree of the two candidate segments and the correct labeling segment is different, and the sequencing loss function is used as a loss function of a video motion detection model;
wherein the ordering loss function comprises: lrank=max(0,cq-cp+ε),cqRepresenting the degree of overlap of one of the two candidate segments with the correctly labeled segment, cpRepresenting the degree of overlap of the other of the two candidate segments with the correct tagged segment.
2. The method according to claim 1, wherein the step of using the video frames corresponding to the similarity of the plurality of target images as the candidate segments of the video to be detected comprises:
taking the first video frame corresponding to the similarity of the target images as a starting frame of the candidate segment, and taking the last video frame corresponding to the similarity of the target images as an ending frame of the candidate segment;
and segmenting the segment between the starting frame and the ending frame from the video to be detected to obtain a candidate segment.
3. The method according to claim 1, wherein the image similarity in the similarity sequence is identified with an index;
if the arrangement sequence of the target image similarities in the similarity sequence is continuous, the step of taking the video frames corresponding to the target image similarities as the candidate segments of the video to be detected comprises the following steps:
judging whether the index marks adjacent to the image similarity are continuous or not;
if so, judging whether the continuous index identification is larger than a preset quantity threshold value;
and if the number of the video frames is larger than the preset number threshold, taking the video frames corresponding to the continuous index marks as candidate segments of the video to be detected.
4. The method of claim 1, wherein after obtaining the candidate segment, the method further comprises:
taking the image similarity which is greater than a second segmentation threshold value in the similarity sequence corresponding to the candidate segment as the detail image similarity; the second segmentation threshold is greater than the first segmentation threshold;
if the arrangement sequence of the multiple sub-image similarities in the similarity sequence is continuous, taking the video frames corresponding to the multiple sub-image similarities as the first type sub-divided candidate segments of the candidate segments;
and taking other segments of the candidate segments divided by the subdivision candidate segments as second-class subdivided candidate segments.
5. The method according to claim 4, wherein the step of using the video frames corresponding to the plurality of subdivided image similarities as the candidate segments of the first type subdivision of the candidate segments comprises:
and taking the first video frame corresponding to the similarity of the plurality of subdivided images as a starting frame of the subdivided candidate segment, taking the last video frame corresponding to the similarity of the plurality of subdivided images as an ending frame of the subdivided candidate segment, and segmenting the candidate segment to obtain the subdivided candidate segment.
6. The method of claim 4 or 5, wherein after obtaining the subdivided candidate segments, the method further comprises:
selecting one of said subdivided candidate segments among adjacent said candidate segments, respectively;
and taking the first video frame of the previous subdivided candidate segment as the starting frame of the lengthened candidate segment, taking the last video frame of the subsequent subdivided candidate segment as the ending frame of the lengthened candidate segment, and segmenting the video to be detected to obtain the lengthened candidate segment.
7. The method of claim 1, further comprising:
and taking the sequencing loss function as a loss function of the video motion detection model, and training the video motion detection model through the candidate segments.
8. The method of claim 1 or 7, further comprising:
and performing motion detection on the candidate segments through a pre-configured video motion detection model.
9. An apparatus for obtaining candidate segments from a video, comprising:
the acquisition module is used for acquiring a video to be detected;
the calculation module is used for respectively calculating the image similarity between the adjacent video frames of the video to be detected through a preset similarity calculation method to obtain a similarity sequence; wherein the ordering of image similarities in the sequence of similarities is the same as the ordering of the video frames;
the searching module is used for taking the image similarity which is greater than a first segmentation threshold value in the similarity sequence as the target image similarity;
the segmentation module is used for taking the video frames corresponding to the similarity degrees of the target images as candidate segments of the video to be detected if the arrangement sequence of the similarity degrees of the target images in the similarity sequence is continuous;
the apparatus is further configured to: setting a sequencing loss function based on the overlapping degree of the two candidate segments and the correct marked segment; the overlapping degree of the two candidate segments and the correct labeling segment is different, and the sequencing loss function is used as a loss function of a video motion detection model;
wherein the ordering loss function comprises: lrank=max(0,cq-cp+ε),cqRepresenting the degree of overlap of one of the two candidate segments with the correctly labeled segment, cpRepresenting the degree of overlap of the other of the two candidate segments with the correct tagged segment.
10. A processing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of the preceding claims 1 to 8 when executing the computer program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the preceding claims 1 to 8.
CN201910231596.9A 2019-03-25 2019-03-25 Method and device for acquiring candidate segments from video and processing equipment Active CN109977262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910231596.9A CN109977262B (en) 2019-03-25 2019-03-25 Method and device for acquiring candidate segments from video and processing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910231596.9A CN109977262B (en) 2019-03-25 2019-03-25 Method and device for acquiring candidate segments from video and processing equipment

Publications (2)

Publication Number Publication Date
CN109977262A CN109977262A (en) 2019-07-05
CN109977262B true CN109977262B (en) 2021-11-16

Family

ID=67080518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910231596.9A Active CN109977262B (en) 2019-03-25 2019-03-25 Method and device for acquiring candidate segments from video and processing equipment

Country Status (1)

Country Link
CN (1) CN109977262B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781740B (en) * 2019-09-20 2023-04-07 网宿科技股份有限公司 Video image quality identification method, system and equipment
CN110704678B (en) * 2019-09-24 2022-10-14 中国科学院上海高等研究院 Evaluation sorting method, evaluation sorting system, computer device and storage medium
CN111339360B (en) * 2020-02-24 2024-03-26 北京奇艺世纪科技有限公司 Video processing method, video processing device, electronic equipment and computer readable storage medium
CN111414868B (en) * 2020-03-24 2023-05-16 北京旷视科技有限公司 Method for determining time sequence action segment, method and device for detecting action
CN111522996B (en) * 2020-04-09 2023-09-08 北京百度网讯科技有限公司 Video clip retrieval method and device
CN111639599B (en) * 2020-05-29 2024-04-02 北京百度网讯科技有限公司 Object image mining method, device, equipment and storage medium
CN111860185A (en) * 2020-06-23 2020-10-30 北京无限创意信息技术有限公司 Shot boundary detection method and system
CN111914926B (en) * 2020-07-29 2023-11-21 深圳神目信息技术有限公司 Sliding window-based video plagiarism detection method, device, equipment and medium
CN112149575A (en) * 2020-09-24 2020-12-29 新华智云科技有限公司 Method for automatically screening automobile part fragments from video
CN112380929A (en) * 2020-10-30 2021-02-19 北京字节跳动网络技术有限公司 Highlight segment obtaining method and device, electronic equipment and storage medium
CN112491999B (en) * 2020-11-18 2022-10-11 成都佳华物链云科技有限公司 Data reporting method and device
CN112749625B (en) * 2020-12-10 2023-12-15 深圳市优必选科技股份有限公司 Time sequence behavior detection method, time sequence behavior detection device and terminal equipment
CN112883782B (en) * 2021-01-12 2023-03-24 上海肯汀通讯科技有限公司 Method, device, equipment and storage medium for identifying putting behaviors
CN112784095A (en) * 2021-01-18 2021-05-11 北京洛塔信息技术有限公司 Difficult sample data mining method, device, equipment and storage medium
CN114827757A (en) * 2021-01-29 2022-07-29 深圳市万普拉斯科技有限公司 Video frame selection method, video time-shrinking processing method and device and computer equipment
CN113762040A (en) * 2021-04-29 2021-12-07 腾讯科技(深圳)有限公司 Video identification method and device, storage medium and computer equipment
CN113191266B (en) * 2021-04-30 2021-10-22 江苏航运职业技术学院 Remote monitoring management method and system for ship power device
CN113177538B (en) * 2021-06-30 2021-08-24 腾讯科技(深圳)有限公司 Video cycle identification method and device, computer equipment and storage medium
CN113449824B (en) * 2021-09-01 2021-11-30 腾讯科技(深圳)有限公司 Video processing method, device and computer readable storage medium
CN114760534B (en) * 2022-03-28 2024-03-01 北京捷通华声科技股份有限公司 Video generation method, device, electronic equipment and readable storage medium
CN114842239B (en) * 2022-04-02 2022-12-23 北京医准智能科技有限公司 Breast lesion attribute prediction method and device based on ultrasonic video
CN117135444A (en) * 2023-03-10 2023-11-28 荣耀终端有限公司 Frame selection decision method and device based on reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506793A (en) * 2017-08-21 2017-12-22 中国科学院重庆绿色智能技术研究院 Clothes recognition methods and system based on weak mark image
CN108090508A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 A kind of classification based training method, apparatus and storage medium
CN108573246A (en) * 2018-05-08 2018-09-25 北京工业大学 A kind of sequential action identification method based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4749139B2 (en) * 2005-12-05 2011-08-17 株式会社日立製作所 Dangerous video detection method, video difference detection method and apparatus
US8150155B2 (en) * 2006-02-07 2012-04-03 Qualcomm Incorporated Multi-mode region-of-interest video object segmentation
CN102902756B (en) * 2012-09-24 2016-02-03 南京邮电大学 A kind of video abstraction extraction method based on plot
CN103839086A (en) * 2014-03-25 2014-06-04 上海交通大学 Interaction behavior detection method in video monitoring scene
CN107273782B (en) * 2016-04-08 2022-12-16 微软技术许可有限责任公司 Online motion detection using recurrent neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506793A (en) * 2017-08-21 2017-12-22 中国科学院重庆绿色智能技术研究院 Clothes recognition methods and system based on weak mark image
CN108090508A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 A kind of classification based training method, apparatus and storage medium
CN108573246A (en) * 2018-05-08 2018-09-25 北京工业大学 A kind of sequential action identification method based on deep learning

Also Published As

Publication number Publication date
CN109977262A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109977262B (en) Method and device for acquiring candidate segments from video and processing equipment
US10402627B2 (en) Method and apparatus for determining identity identifier of face in face image, and terminal
CN110235138B (en) System and method for appearance search
US9626551B2 (en) Collation apparatus and method for the same, and image searching apparatus and method for the same
WO2017096753A1 (en) Facial key point tracking method, terminal, and nonvolatile computer readable storage medium
JP5953151B2 (en) Learning device and program
JP5554984B2 (en) Pattern recognition method and pattern recognition apparatus
US9805264B2 (en) Incremental learning framework for object detection in videos
JP5526955B2 (en) Face clustering device, face clustering method, and program
US11055538B2 (en) Object re-identification with temporal context
JP2012088881A (en) Person motion detection device and program thereof
JP6997369B2 (en) Programs, ranging methods, and ranging devices
Giraldo et al. Graph CNN for moving object detection in complex environments from unseen videos
WO2015078134A1 (en) Video classification method and device
CN114399644A (en) Target detection method and device based on small sample
JP2022540101A (en) POSITIONING METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM
Siméoni et al. Unsupervised object localization: Observing the background to discover objects
CN106407978B (en) Method for detecting salient object in unconstrained video by combining similarity degree
CN112131944B (en) Video behavior recognition method and system
CN112417970A (en) Target object identification method, device and electronic system
Roy et al. Foreground segmentation using adaptive 3 phase background model
WO2023123923A1 (en) Human body weight identification method, human body weight identification device, computer device, and medium
CN109961103B (en) Training method of feature extraction model, and image feature extraction method and device
Wu et al. Semi-supervised human detection via region proposal networks aided by verification
JP6393495B2 (en) Image processing apparatus and object recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant