CN109977262A - The method, apparatus and processing equipment of candidate segment are obtained from video - Google Patents
The method, apparatus and processing equipment of candidate segment are obtained from video Download PDFInfo
- Publication number
- CN109977262A CN109977262A CN201910231596.9A CN201910231596A CN109977262A CN 109977262 A CN109977262 A CN 109977262A CN 201910231596 A CN201910231596 A CN 201910231596A CN 109977262 A CN109977262 A CN 109977262A
- Authority
- CN
- China
- Prior art keywords
- similarity
- candidate segment
- video
- sequence
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Abstract
The present invention provides a kind of from video obtains the method, apparatus and processing equipment of candidate segment, is related to motion detection technical field, this method, comprising: obtain video to be detected;The image similarity between video adjacent video frames to be detected is calculated separately by preset similarity algorithm, obtains similarity sequence;Wherein, the sequence of the image similarity in similarity sequence and the sequence of video frame are identical;The image similarity of the first segmentation threshold will be greater than in similarity sequence as target image similarity;If multiple target image similarities putting in order continuously, using the corresponding video frame of multiple target image similarities as the candidate segment of video to be detected in similarity sequence.The method, apparatus and processing equipment provided in an embodiment of the present invention that candidate segment is obtained from video can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for various video actions detection models.
Description
Technical field
The present invention relates to motion detection technical field, more particularly, to a kind of method that candidate segment is obtained from video,
Device and processing equipment.
Background technique
Video actions detection refers in detection target video with the presence or absence of specific target action, if there is spy in video
Fixed target action, it is also necessary to determine the initial time and terminate the time that target action occurs.With the outburst of number of videos
Formula increases, and video actions detection is used in more and more extensive field, including pedestrian's supervision, automatic Pilot, short Video segmentation
Deng.
Since the duration difference of different movements is larger and it is many kinds of to act, the effect of video actions detection is not
It is ideal.Existing major video motion detection method is then first output may train a classification net comprising the segment of movement
Network classifies to above-mentioned segment, however has the following problems: if the background of video and prospect similarity are higher, causing to extract
The discrimination of feature is not strong, therefore the position inaccurate of operation limit;Sorter network general-purpose capability is poor, usually in certain number
It is fitted by force according on collection, it is poor to the nicety of grading of other data sets, it needs to readjust parameter.
For the above problem of the detection of video actions in the prior art, currently no effective solution has been proposed.
Summary of the invention
In view of this, the purpose of the present invention is to provide it is a kind of from video obtain candidate segment method, apparatus and from
Equipment is managed, there can be good robustness with the more accurate candidate segment of output, be suitable for various video actions detection models.
In a first aspect, the method that the embodiment of the invention provides a kind of from video obtains candidate segment, comprising: obtain to
Detect video;It is similar that the image between the video adjacent video frames to be detected is calculated separately by preset similarity algorithm
Degree, obtains similarity sequence;Wherein, the sequence and the sequence phase of the video frame of the image similarity in the similarity sequence
Together;The described image similarity of the first segmentation threshold will be greater than in the similarity sequence as target image similarity;If
Multiple target image similarities putting in order continuously, by the multiple target image similarity pair in the similarity sequence
Candidate segment of the video frame answered as the video to be detected.
Further, described using the corresponding video frame of the multiple target image similarity as the time of the video to be detected
The step of selected episode, comprising: using corresponding first video frame of the multiple target image similarity rising as candidate segment
Beginning frame, using last corresponding video frame of the multiple target image similarity as the end frame of candidate segment;From described
It is partitioned into the start frame in video to be detected to the segment for terminating interframe, obtains candidate segment.
Further, the described image similarity in the similarity sequence is with index mark;If multiple targets
Image similarity putting in order continuously, by the corresponding video of the multiple target image similarity in the similarity sequence
The step of candidate segment of the frame as the video to be detected, comprising: judge the index mark of adjacent described image similarity
Whether continuous know;If so, the continuous index of judgement identifies whether to be greater than preset quantity threshold value;
If it is greater than the preset quantity threshold value, the continuous index is identified into corresponding video frame as described to be checked
Survey the candidate segment of video.
Further, after obtaining the candidate segment, the method also includes: the candidate segment is corresponding similar
Greater than the image similarity of the second segmentation threshold as subdivision image similarity in degree series;Second segmentation threshold is greater than institute
State the first segmentation threshold;If multiple subdivision image similarities putting in order continuously in the similarity sequence, will be described
Multiple candidate segments segmenting the corresponding video frame of image similarity and being segmented as the first kind of the candidate segment;By the time
The candidate segment segmented by other segments that the subdivision candidate segment is partitioned into as the second class in selected episode.
Further, described using the corresponding video frame of the multiple subdivision image similarity as the first of the candidate segment
The step of candidate segment of class subdivision, comprising: using corresponding first video frame of the multiple subdivision image similarity as thin
The start frame for dividing candidate segment, using last corresponding video frame of the multiple subdivision image similarity as the candidate piece of subdivision
The end frame of section, divides the candidate segment and obtains the subdivision candidate segment.
Further, after obtaining the subdivision candidate segment, the method also includes: in the adjacent candidate segment
The candidate segment of a subdivision is selected respectively;Using first video frame of the preceding subdivision candidate segment as lengthening
Candidate segment start frame, using it is posterior it is described subdivision candidate segment last video frame as lengthening candidate segment
End frame, divide the candidate segment that the video to be detected is lengthened.
Further, the method also includes the degree of overlapping based on two candidate segments and correct mark segment, settings
Sort loss function;Two candidate segments are different from the correct mark degree of overlapping of segment;Letter is lost into the sequence
Loss function of the number as video actions detection model, and the video actions detection model is carried out by the candidate segment
Training.
Further, the method also includes: by preconfigured video actions detection model to the candidate segment into
Row motion detection.
Second aspect, the embodiment of the invention provides a kind of from video obtains the device of candidate segment, comprising: obtains mould
Block, for obtaining video to be detected;Computing module, for calculating separately the video to be detected by preset similarity algorithm
Image similarity between adjacent video frames, obtains similarity sequence;Wherein, image similarity in the similarity sequence
It sorts identical as the sequence of the video frame;Searching module, for the first segmentation threshold will to be greater than in the similarity sequence
Described image similarity is as target image similarity;Divide module, if for multiple target image similarities in the phase
Like putting in order continuously, using the corresponding video frame of the multiple target image similarity as the view to be detected in degree series
The candidate segment of frequency.
The third aspect the embodiment of the invention provides a kind of processing equipment, including memory, processor and is stored in described
On memory and the computer program that can run on the processor, the processor are realized when executing the computer program
The step of any one of above-mentioned first aspect the method.
Fourth aspect, the embodiment of the invention provides a kind of meters of non-volatile program code that can be performed with processor
The step of calculation machine readable medium, said program code makes the processor execute any one of above-mentioned first aspect the method.
The method, apparatus and processing equipment provided in an embodiment of the present invention that candidate segment is obtained from video, by default
Similarity algorithm calculate separately the image similarity between video adjacent video frames to be detected, obtain similarity sequence, the phase
It is identical like the sequence of the image similarity in degree series and the sequence of video frame, then the first segmentation threshold will be greater than in similarity sequence
It is worth and continuous candidate segment of the corresponding video frame of image similarity as video to be detected, the above method passes through adjacent video
Image similarity and segmentation strategy between frame can have good Shandong with the more accurate candidate segment of output, the candidate segment
Stick and be suitable for various video actions detection models.
Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.
To enable the above objects, features, and advantages of the disclosure to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of structural schematic diagram of processing equipment provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of method that candidate segment is obtained from video provided in an embodiment of the present invention;
Fig. 3 is the process schematic provided in an embodiment of the present invention using Ranking Loss training pattern;
Fig. 4 is the process schematic provided in an embodiment of the present invention that candidate segment is generated using SSIM sequence;
Fig. 5 is the verification result of video actions detection model provided in an embodiment of the present invention;
Fig. 6 is a kind of structural block diagram of device that candidate segment is obtained from video provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
In existing video actions detection method, output may have the following problems comprising the process of the segment of movement: 1.
The operation limit position inaccurate of section;2. generalization ability is poor, the segment being fitted by force can not be suitable for other data sets.Base
In this, the embodiment of the invention provides a kind of from video obtains the method, apparatus and processing equipment of candidate segment, below by way of
The embodiment of the present invention describes in detail.
Embodiment one:
Firstly, describing the processing equipment 100 for realizing the embodiment of the present invention referring to Fig.1, which can be used
In the method for operation various embodiments of the present invention.
As shown in Figure 1, processing equipment 100 includes one or more processors 102, one or more memories 104, input
Device 106, output device 108 and data collector 110, the company that these components pass through bus system 112 and/or other forms
The interconnection of connection mechanism (not shown).It should be noted that the component and structure of processing equipment 100 shown in FIG. 1 are only exemplary, rather than
Restrictive, as needed, the processing equipment also can have other assemblies and structure.
The processor 102 can use digital signal processor (DSP), field programmable gate array (FPGA), can compile
At least one of journey logic array (PLA) and ASIC (Application Specific Integrated Circuit) are hard
Part form realizes that the processor 102 can be central processing unit (CPU) or have data-handling capacity and/or refer to
The processing unit of the other forms of executive capability is enabled, and can control other components in the processing equipment 100 to execute
Desired function.
The memory 104 may include one or more computer program products, and the computer program product can be with
Including various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described volatile
Property memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-easy
The property lost memory for example may include read-only memory (ROM), hard disk, flash memory etc..On the computer readable storage medium
It can store one or more computer program instructions, processor 102 can run described program instruction, described below to realize
The embodiment of the present invention in the client functionality (realized by processor) and/or other desired functions.In the calculating
Various application programs and various data can also be stored in machine readable storage medium storing program for executing, such as the application program is used and/or produced
Raw various data etc..
The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat
One or more of gram wind and touch screen etc..
The output device 108 can export various information (for example, image or sound) to external (for example, user), and
It and may include one or more of display, loudspeaker etc..
The data collector 110 is for carrying out data acquisition, wherein data collector data collected are current mesh
The initial data or target data can also be stored in described by target initial data or target data, then, data collector
For the use of other components in memory 104.
Illustratively, the processing for obtaining the method for candidate segment from video for realizing according to an embodiment of the present invention is set
It is standby to may be implemented as the intelligent terminals such as server, smart phone, tablet computer, computer.
Embodiment two:
A kind of method that the embodiment of the invention provides image processing methods to obtain candidate segment from video, referring to fig. 2
Shown in it is a kind of from video obtain candidate segment method flow chart, this method can by previous embodiment provide processing set
Standby to execute, this method may include steps of:
Step S202 obtains video to be detected.
The method provided in this embodiment that candidate segment is obtained from video, it is therefore an objective to video extraction to be detected be obtained more
A candidate segment (proposals) can carry out further motion detection to video based on above-mentioned candidate segment.
Step S204 calculates separately the image phase between video adjacent video frames to be detected by preset similarity algorithm
Like degree, similarity sequence is obtained.Wherein, the sequence of the image similarity in similarity sequence and the sequence of video frame are identical.
The preset similarity algorithm can be used for measuring the similarity degree of two images, in the present embodiment i.e. for weighing
The image similarity for measuring adjacent two field pictures in video, can determine in adjacent two field pictures whether wrap by the image similarity
Containing continuous action, to carry out subsequent Video segmentation accordingly.Above-mentioned similarity algorithm can for example use mean square error MSE
(mean-square error), structural similarity SSIM (structural similarity index) or Y-PSNR
PSNR (Peak Signal to Noise Ratio) scheduling algorithm.All adjacent two field pictures of video to be detected are being calculated
Between image similarity after, the sequence that all similarities are occurred in video according to its corresponding image arranges, and can obtain
To similarity sequence.The sequence phase of the corresponding video frame of the sequence of image similarity in finally obtained similarity sequence
Together.
Step S206, the image similarity that the first segmentation threshold is greater than in similarity sequence is similar as target image
Degree.
Wherein, indicate to include continuous movement in adjacent two field pictures greater than the similarity of the first segmentation threshold, being less than should
The similarity of first segmentation threshold then indicate do not include in adjacent two field pictures continuously act, therefore according to above-mentioned similarity with
The comparison result of first segmentation threshold can find out some start image of movement in video and terminate image.
Step S208, if multiple target image similarities putting in order continuously, by multiple mesh in similarity sequence
Candidate segment of the corresponding video frame of logo image similarity as video to be detected.
Since above-mentioned image similarity is the image similarity degree indicated between adjacent video frames, multiple target images are similar
Degree putting in order in similarity sequence is continuous, that is, can determine in the corresponding video frame of multiple target image similarity and include
There is continuous movement, needs to split its corresponding video frame, to obtain the candidate segment of video to be detected.
The method provided in an embodiment of the present invention that candidate segment is obtained from video, is distinguished by preset similarity algorithm
The image similarity between video adjacent video frames to be detected is calculated, similarity sequence is obtained, the image in the similarity sequence
The sequence of similarity and the sequence of video frame are identical, then will be greater than the first segmentation threshold and continuous image phase in similarity sequence
Like candidate segment of the corresponding video frame as video to be detected is spent, the above method is similar by the image between adjacent video frames
Degree and segmentation strategy can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for various
Video actions detection model.
After obtaining similarity sequence, continuous multiple images similarity can be therefrom selected, and by its corresponding video
Fragment segmentation comes out, i.e., above-mentioned candidate segment, wherein using the corresponding video frame of multiple target image similarities as view to be detected
The candidate segment of frequency can execute in the following way: using corresponding first video frame of multiple target image similarities as
The start frame of candidate segment, using last corresponding video frame of multiple target image similarities as the end of candidate segment
Frame is partitioned into start frame to the segment for terminating interframe from video to be detected, obtains candidate segment.Due to above-mentioned image similarity
Refer to the similarity between consecutive frame image, therefore each image similarity corresponds to two images, therefore aforementioned start frame refers to
Previous image in corresponding two images of target image similarity, aforementioned end frame feeling the pulse with the finger-tip logo image similarity are two corresponding
Latter image in image.
Continuous image similarity is obtained for the ease of screening from similarity sequence, can be the figure in similarity sequence
As similarity setting index mark, the sequence of index mark is also identical as the sequence of video frame, such as can be frame image
Serial number.When determining the candidate segment of video to be detected, it can be determined that the index of adjacent image similarity identify whether continuously, with
For serial number, then may determine that whether the difference of the serial number of adjacent image similarity is 1.If index mark be it is continuous, after
The continuous continuous index of judgement identifies whether to be greater than preset quantity threshold value, it is therefore an objective to exclude the very few segment of continuous quantity and examine to movement
The adverse effect of survey.If it is greater than preset quantity threshold value, the corresponding video frame of mark will be continuously indexed as video to be detected
Candidate segment.
In order to obtain position the more accurate candidate segment in boundary, aforementioned obtained each candidate segment can also be continued into
Row segmentation, the segmentation of output more details.Therefore the above method can also include:
(1) image similarity of the second segmentation threshold will be greater than in the corresponding similarity sequence of candidate segment as subdivision graph
As similarity, wherein the second segmentation threshold is greater than the first segmentation threshold;
(2) if multiple subdivision image similarity putting in order continuously, by multiple subdivision images in similarity sequence
The candidate segment that the corresponding video frame of similarity is segmented as the first kind of candidate segment.It is similar with previous segmentation process, it can be with
Using corresponding first video frame of multiple subdivision image similarities as the start frame of subdivision candidate segment, by multiple subdivision images
End frame of last the corresponding video frame of similarity as subdivision candidate segment, dividing candidate segment obtain segmenting candidate piece
Section.It is final to improve so as to be partitioned into finer candidate segment in original candidate segment by improving segmentation threshold
The precision of motion detection.
(3) the candidate piece segmented other segments that candidate segment is partitioned into are subdivided in candidate segment as the second class
Section.It is divided part in abovementioned steps (2), in original candidate segment as finer candidate segment, original
Candidate segment in further include at least one other remaining segment, using other segments also as finer candidate piece
Section.
For the candidate segment of output different length, aforementioned obtained each candidate segment can also be continued again
Combination, the different segmentation of output length.The above method can also include:
(1) candidate segment of a subdivision is selected respectively in neighboring candidate segment.In two adjacent candidate segments, respectively
From the candidate segment of one subdivision of selection, position of the candidate segment of the subdivision in corresponding candidate segment is unlimited.
It (2), will using first video frame of the candidate segment of preceding subdivision as the start frame of the candidate segment lengthened
End frame of last video frame of the candidate segment of posterior subdivision as the candidate segment lengthened, divides video to be detected
The candidate segment lengthened.When connecting together the candidate segment that two are segmented, the video frame among the two is also wrapped
Include the candidate segment only lengthened.Position based on the candidate segment of subdivision in corresponding candidate segment is different, can obtain
To the candidate segment of different lengths, to enrich the sample size for being trained or detecting.
Above-mentioned candidate segment is being obtained, video actions detection model can be trained or by preconfigured view
Frequency motion detection model carries out motion detection to candidate segment.In the training process, in order to improve the precision of model, time has been included in it
Order information of the selected episode in video to be detected, to be different from the different candidate segment of realistic operation segment degree of overlapping.Base
In above-mentioned thought, the above method is further comprising the steps of:
(1) degree of overlapping based on two candidate segments and correct mark segment, setting sequence loss function;Two candidate pieces
Section is different from the correct mark degree of overlapping of segment.(2) will sequence loss function as the loss function of video actions detection model,
And video actions detection model is trained by candidate segment.
Most of existing method is all to obtain video actions detection model using intersection entropy loss training deep learning model,
Then classify to candidate segment, have ignored the relation information between candidate segment.For two candidate segments, due to depth
The precision reason for spending learning model, causes their score relatively high.If candidate segment is added when training to regard
Order information in frequency thus may be used so that the score of good candidate segment really can be more much higher than the candidate segment of difference
So that the precision of model greatly improves.Sequence damage can be added on the basis of intersecting entropy loss when training pattern
It loses function (Ranking Loss).Assuming that two candidate segments are overlapping with the movement segment (ground-truth) of correct labeling
Degree is respectively cp,cq, without loss of generality, it is assumed that cp>cq, then it is as follows that sequence loss function can be set when training:
lrank=max (0, cq-cp+ε)
Process schematic using Ranking Loss training pattern shown in Figure 3, wherein ψ1、ψ2、ψ3It is 3 respectively
A different candidate segment, C1、C2、C3Respectively indicate ψ1、ψ2、ψ3With the degree of overlapping of correct mark segment, i.e. model training process
Middle candidate segment ψ1、ψ2、ψ3Corresponding score., the target of model training is if C1、C2、C3It is ranked up two-by-two, it is corresponding
Sequence be C1> C2> C3。
Following embodiment is illustrated so that SSIM is split video as an example.There is very strong association between adjacent picture
Property, formula are as follows:
Wherein x and y indicates two images, μxAnd μyIt is its average value, σxAnd σyIt is its standard deviation, σxyIt is two pictures
Covariance, C1And C2For constant.SSIM compares brightness, contrast and the structural similarity of two pictures.Use SSIM similarity
Sequence, can be specific as follows by segmentation strategy and convergence strategy output candidate segment abundant:
(1) segmentation threshold θ output binary set segmentation strategy: is used to SSIM sequence S.It will be less than or equal to segmentation threshold
Similarity is set as vector 1, and the similarity that will be greater than segmentation threshold is set as vector 0, and wherein vector 1 indicates the side of candidate segment
Boundary, 0 indicates the inside of candidate segment.
To the binary set of similarity collect promising vector 1 index, obtain B={ i, xi≠ 0 }, wherein xiFrom B
(S,θ)。
(2) convergence strategy: the index that above-mentioned vector is 1 is connected, the candidate segment of video is obtained
Wherein xiIt is degree of communication from B, δ, T is the length of B.
Use the initial candidate segment Φ of above-mentioned segmentation strategy and the available video of convergence strategyini.In order to more accurate
Boundary is positioned, to ΦiniIn each segment continue to do segmentation strategy and convergence strategy, the candidate segment of available more details
Φdet.For the candidate segment of output different length, can be indexed according to all boundaries in two neighboring candidate segments, output
Longer segment Φcom, final candidate segment of all candidate segments as a video can be finally collected, as follows:
ΦV=Φini∪Φdet∪Φcom
The shown in Figure 4 process schematic that candidate segment is generated using SSIM sequence, wherein initial candidate segment example
Such as x1 0-x2 0、x3 0-x4 0, obtained subdivision candidate segment such as x3 0-x1 1、x1 1-x4 0, obtained lengthening candidate segment such as x2 0-
x3 0-x1 1、x3 0-x1 1-x4 0。
In conjunction with the video actions detection model that SSIM sequence and Ranking Loss above-mentioned, training obtain, verification result
As shown in figure 5, the result of its output will be far better than existing method, the first row is correct labeling in a figure and b figure of Fig. 5
Both candidate segment, the second row are preferable candidate segments, and the third line is poor candidate segment, as can be seen from Figure 5
Score has very big gap, and Ranking Loss successfully inhibits poor candidate segment.
Embodiment three:
For image processing method provided in embodiment two, the embodiment of the invention provides one kind to obtain from video
The device of candidate segment, a kind of structural block diagram of device obtaining candidate segment from video shown in Figure 6, comprising:
Module 602 is obtained, for obtaining video to be detected;
Computing module 604, for being calculated separately between video adjacent video frames to be detected by preset similarity algorithm
Image similarity, obtain similarity sequence;Wherein, the sequence of the sequence of the image similarity in similarity sequence and video frame
It is identical;
Searching module 606, for the image similarity of the first segmentation threshold will to be greater than in similarity sequence as target figure
As similarity;
Divide module 608, it, will if the putting in order continuously in similarity sequence for multiple target image similarities
Candidate segment of the corresponding video frame of multiple target image similarities as video to be detected.
The device provided in an embodiment of the present invention that candidate segment is obtained from video, passes through the image between adjacent video frames
Similarity and segmentation strategy can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for
Various video actions detection models.
In one embodiment, segmentation module is also used to: by corresponding first video of multiple target image similarities
Start frame of the frame as candidate segment, using last corresponding video frame of multiple target image similarities as candidate segment
End frame;It is partitioned into start frame from video to be detected to the segment for terminating interframe, obtains candidate segment.
In another embodiment, the image similarity in similarity sequence is with index mark;Segmentation module is also used
In: judge that the index of adjacent image similarity identifies whether continuously;If so, the continuous index of judgement is identified whether to be greater than and be preset
Amount threshold;If it is greater than preset quantity threshold value, continuous index is identified into corresponding video frame as the time of video to be detected
Selected episode.
In one embodiment, above-mentioned apparatus further includes subdivision module, is used for: by the corresponding similarity sequence of candidate segment
Greater than the image similarity of the second segmentation threshold as subdivision image similarity in column;Second segmentation threshold is greater than the first segmentation threshold
Value;If multiple subdivision image similarities putting in order continuously, by multiple subdivision image similarities pair in similarity sequence
The candidate segment that the video frame answered is segmented as the first kind of candidate segment;It is partitioned into candidate segment is subdivided in candidate segment
The candidate segment segmented as the second class of other segments.
In another embodiment, above-mentioned subdivision module is also used to: by multiple subdivision image similarities corresponding first
Video frame as the start frame for segmenting candidate segment, using last corresponding video frame of multiple subdivision image similarities as
The end frame of candidate segment is segmented, dividing candidate segment obtains subdivision candidate segment.
In one embodiment, above-mentioned apparatus further includes lengthening module, is used for: selecting respectively in neighboring candidate segment
The candidate segment of one subdivision;Using first video frame of preceding subdivision candidate segment as the starting of the candidate segment lengthened
Frame is divided to be detected using last video frame of posterior subdivision candidate segment as the end frame of the candidate segment lengthened
The candidate segment that video is lengthened.
In one embodiment, above-mentioned apparatus further includes training module, is used for: based on two candidate segments and correct mark
Infuse the degree of overlapping of segment, setting sequence loss function;Two candidate segments are different from the correct mark degree of overlapping of segment;It will sequence
Loss function of the loss function as video actions detection model, and video actions detection model is instructed by candidate segment
Practice.
In one embodiment, above-mentioned apparatus further includes detection module, is used for: being examined by preconfigured video actions
It surveys model and motion detection is carried out to candidate segment.
The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter
It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In addition, present embodiments provide a kind of processing equipment, which includes memory, processor and is stored in memory
Computer program that is upper and can running on a processor, processor realize appearance provided by the above embodiment when executing computer program
Gesture recognition methods.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description
Specific work process, can be with reference to the corresponding process in previous embodiment, and details are not described herein.
Further, a kind of computer readable storage medium is present embodiments provided, is deposited on the computer readable storage medium
The step of containing computer program, method provided by the above embodiment executed when computer program is run by processor.
A kind of meter for the method, apparatus and processing equipment obtaining candidate segment from video provided by the embodiment of the present invention
Calculation machine program product, the computer readable storage medium including storing program code, the instruction that said program code includes can
For executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.It is described
If function is realized in the form of SFU software functional unit and when sold or used as an independent product, it can store and counted at one
In calculation machine read/write memory medium.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out or the part of the technical solution can be embodied in the form of software products, the computer software product
It is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service
Device or the network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage above-mentioned is situated between
Matter include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), the various media that can store program code such as magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (11)
1. a kind of method for obtaining candidate segment from video characterized by comprising
Obtain video to be detected;
The image similarity between the video adjacent video frames to be detected is calculated separately by preset similarity algorithm, is obtained
Similarity sequence;Wherein, the sequence of the image similarity in the similarity sequence is identical as the sequence of the video frame;
The described image similarity of the first segmentation threshold will be greater than in the similarity sequence as target image similarity;
If multiple target image similarities putting in order continuously, by the multiple target image in the similarity sequence
Candidate segment of the corresponding video frame of similarity as the video to be detected.
2. the method according to claim 1, wherein described by the corresponding view of the multiple target image similarity
The step of candidate segment of the frequency frame as the video to be detected, comprising:
It, will be the multiple using corresponding first video frame of the multiple target image similarity as the start frame of candidate segment
End frame of last the corresponding video frame of target image similarity as candidate segment;
It is partitioned into the start frame from the video to be detected to the segment for terminating interframe, obtains candidate segment.
3. the method according to claim 1, wherein the described image similarity in the similarity sequence has
Index mark;
If multiple target image similarities putting in order continuously, by the multiple target in the similarity sequence
The step of candidate segment of the corresponding video frame of image similarity as the video to be detected, comprising:
Judge that the index of adjacent described image similarity identifies whether continuously;
If so, the continuous index of judgement identifies whether to be greater than preset quantity threshold value;
If it is greater than the preset quantity threshold value, the continuous index is identified into corresponding video frame as the view to be detected
The candidate segment of frequency.
4. the method according to claim 1, wherein the method is also wrapped after obtaining the candidate segment
It includes:
The image similarity of the second segmentation threshold will be greater than in the corresponding similarity sequence of the candidate segment as subdivision image
Similarity;Second segmentation threshold is greater than first segmentation threshold;
If multiple subdivision image similarities putting in order continuously, by the multiple subdivision image in the similarity sequence
The candidate segment that the corresponding video frame of similarity is segmented as the first kind of the candidate segment;
The candidate piece that will be segmented by other segments that the subdivision candidate segment is partitioned into as the second class in the candidate segment
Section.
5. according to the method described in claim 4, it is characterized in that, described by the corresponding view of the multiple subdivision image similarity
The step of candidate segment that frequency frame is segmented as the first kind of the candidate segment, comprising:
It, will be described using corresponding first video frame of the multiple subdivision image similarity as the start frame of subdivision candidate segment
Multiple end frames for segmenting last corresponding video frame of image similarity as subdivision candidate segment divide the candidate piece
Section obtains the subdivision candidate segment.
6. method according to claim 4 or 5, which is characterized in that after obtaining the subdivision candidate segment, the side
Method further include:
Select the candidate segment of a subdivision respectively in the adjacent candidate segment;
It, will be posterior using first video frame of the preceding subdivision candidate segment as the start frame of the candidate segment lengthened
End frame of last video frame of the subdivision candidate segment as the candidate segment lengthened, divides the video to be detected
The candidate segment lengthened.
7. the method according to claim 1, wherein the method also includes:
Degree of overlapping based on two candidate segments and correct mark segment, setting sequence loss function;Two candidates
Segment is different from the correct mark degree of overlapping of segment;
Using the sequence loss function as the loss function of video actions detection model, and by the candidate segment to described
Video actions detection model is trained.
8. method according to claim 1 or claim 7, which is characterized in that the method also includes:
Motion detection is carried out to the candidate segment by preconfigured video actions detection model.
9. a kind of device for obtaining candidate segment from video characterized by comprising
Module is obtained, for obtaining video to be detected;
Computing module, for calculating separately the figure between the video adjacent video frames to be detected by preset similarity algorithm
As similarity, similarity sequence is obtained;Wherein, the sequence of the image similarity in the similarity sequence and the video frame
It sorts identical;
Searching module, for the described image similarity of the first segmentation threshold will to be greater than in the similarity sequence as target figure
As similarity;
Divide module, if the putting in order continuously, by institute in the similarity sequence for multiple target image similarities
State candidate segment of the corresponding video frame of multiple target image similarities as the video to be detected.
10. a kind of processing equipment, including memory, processor and it is stored on the memory and can transports on the processor
Capable computer program, which is characterized in that the processor realizes the claims 1 to 8 when executing the computer program
Any one of described in method the step of.
11. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium
The step of being, the described in any item methods of the claims 1 to 8 executed when the computer program is run by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231596.9A CN109977262B (en) | 2019-03-25 | 2019-03-25 | Method and device for acquiring candidate segments from video and processing equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910231596.9A CN109977262B (en) | 2019-03-25 | 2019-03-25 | Method and device for acquiring candidate segments from video and processing equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977262A true CN109977262A (en) | 2019-07-05 |
CN109977262B CN109977262B (en) | 2021-11-16 |
Family
ID=67080518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910231596.9A Active CN109977262B (en) | 2019-03-25 | 2019-03-25 | Method and device for acquiring candidate segments from video and processing equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977262B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704678A (en) * | 2019-09-24 | 2020-01-17 | 中国科学院上海高等研究院 | Evaluation sorting method, evaluation sorting system, computer device and storage medium |
CN110781740A (en) * | 2019-09-20 | 2020-02-11 | 网宿科技股份有限公司 | Video image quality identification method, system and equipment |
CN111339360A (en) * | 2020-02-24 | 2020-06-26 | 北京奇艺世纪科技有限公司 | Video processing method and device, electronic equipment and computer readable storage medium |
CN111414868A (en) * | 2020-03-24 | 2020-07-14 | 北京旷视科技有限公司 | Method for determining time sequence action fragment, action detection method and device |
CN111522996A (en) * | 2020-04-09 | 2020-08-11 | 北京百度网讯科技有限公司 | Video clip retrieval method and device |
CN111639599A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN111860185A (en) * | 2020-06-23 | 2020-10-30 | 北京无限创意信息技术有限公司 | Shot boundary detection method and system |
CN111914926A (en) * | 2020-07-29 | 2020-11-10 | 深圳神目信息技术有限公司 | Sliding window-based video plagiarism detection method, device, equipment and medium |
CN112149575A (en) * | 2020-09-24 | 2020-12-29 | 新华智云科技有限公司 | Method for automatically screening automobile part fragments from video |
CN112380929A (en) * | 2020-10-30 | 2021-02-19 | 北京字节跳动网络技术有限公司 | Highlight segment obtaining method and device, electronic equipment and storage medium |
CN112491999A (en) * | 2020-11-18 | 2021-03-12 | 成都佳华物链云科技有限公司 | Data reporting method and device |
CN112749625A (en) * | 2020-12-10 | 2021-05-04 | 深圳市优必选科技股份有限公司 | Time sequence behavior detection method, time sequence behavior detection device and terminal equipment |
CN112784095A (en) * | 2021-01-18 | 2021-05-11 | 北京洛塔信息技术有限公司 | Difficult sample data mining method, device, equipment and storage medium |
CN112883782A (en) * | 2021-01-12 | 2021-06-01 | 上海肯汀通讯科技有限公司 | Method, device, equipment and storage medium for identifying putting behaviors |
CN113191266A (en) * | 2021-04-30 | 2021-07-30 | 江苏航运职业技术学院 | Remote monitoring management method and system for ship power device |
CN113449824A (en) * | 2021-09-01 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Video processing method, device and computer readable storage medium |
CN113762040A (en) * | 2021-04-29 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Video identification method and device, storage medium and computer equipment |
CN114760534A (en) * | 2022-03-28 | 2022-07-15 | 北京捷通华声科技股份有限公司 | Video generation method and device, electronic equipment and readable storage medium |
CN114827757A (en) * | 2021-01-29 | 2022-07-29 | 深圳市万普拉斯科技有限公司 | Video frame selection method, video time-shrinking processing method and device and computer equipment |
CN114842239A (en) * | 2022-04-02 | 2022-08-02 | 北京医准智能科技有限公司 | Breast lesion attribute prediction method and device based on ultrasonic video |
WO2023273628A1 (en) * | 2021-06-30 | 2023-01-05 | 腾讯科技(深圳)有限公司 | Video loop recognition method and apparatus, computer device, and storage medium |
CN117135444A (en) * | 2023-03-10 | 2023-11-28 | 荣耀终端有限公司 | Frame selection decision method and device based on reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101375312A (en) * | 2006-02-07 | 2009-02-25 | 高通股份有限公司 | Multi-mode region-of-interest video object segmentation |
US7853084B2 (en) * | 2005-12-05 | 2010-12-14 | Hitachi, Ltd. | Method of detecting feature images |
CN102902756A (en) * | 2012-09-24 | 2013-01-30 | 南京邮电大学 | Video abstraction extraction method based on story plots |
CN103839086A (en) * | 2014-03-25 | 2014-06-04 | 上海交通大学 | Interaction behavior detection method in video monitoring scene |
CN107506793A (en) * | 2017-08-21 | 2017-12-22 | 中国科学院重庆绿色智能技术研究院 | Clothes recognition methods and system based on weak mark image |
CN108090508A (en) * | 2017-12-12 | 2018-05-29 | 腾讯科技(深圳)有限公司 | A kind of classification based training method, apparatus and storage medium |
CN108573246A (en) * | 2018-05-08 | 2018-09-25 | 北京工业大学 | A kind of sequential action identification method based on deep learning |
US20190080176A1 (en) * | 2016-04-08 | 2019-03-14 | Microsoft Technology Licensing, Llc | On-line action detection using recurrent neural network |
-
2019
- 2019-03-25 CN CN201910231596.9A patent/CN109977262B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7853084B2 (en) * | 2005-12-05 | 2010-12-14 | Hitachi, Ltd. | Method of detecting feature images |
CN101375312A (en) * | 2006-02-07 | 2009-02-25 | 高通股份有限公司 | Multi-mode region-of-interest video object segmentation |
CN102902756A (en) * | 2012-09-24 | 2013-01-30 | 南京邮电大学 | Video abstraction extraction method based on story plots |
CN103839086A (en) * | 2014-03-25 | 2014-06-04 | 上海交通大学 | Interaction behavior detection method in video monitoring scene |
US20190080176A1 (en) * | 2016-04-08 | 2019-03-14 | Microsoft Technology Licensing, Llc | On-line action detection using recurrent neural network |
CN107506793A (en) * | 2017-08-21 | 2017-12-22 | 中国科学院重庆绿色智能技术研究院 | Clothes recognition methods and system based on weak mark image |
CN108090508A (en) * | 2017-12-12 | 2018-05-29 | 腾讯科技(深圳)有限公司 | A kind of classification based training method, apparatus and storage medium |
CN108573246A (en) * | 2018-05-08 | 2018-09-25 | 北京工业大学 | A kind of sequential action identification method based on deep learning |
Non-Patent Citations (2)
Title |
---|
张顺等: "深度卷积神经网络的发展及其在计算机视觉领域的应用", 《计算机学报》 * |
汪翔: "基于内容的视频检索关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781740A (en) * | 2019-09-20 | 2020-02-11 | 网宿科技股份有限公司 | Video image quality identification method, system and equipment |
CN110781740B (en) * | 2019-09-20 | 2023-04-07 | 网宿科技股份有限公司 | Video image quality identification method, system and equipment |
CN110704678A (en) * | 2019-09-24 | 2020-01-17 | 中国科学院上海高等研究院 | Evaluation sorting method, evaluation sorting system, computer device and storage medium |
CN110704678B (en) * | 2019-09-24 | 2022-10-14 | 中国科学院上海高等研究院 | Evaluation sorting method, evaluation sorting system, computer device and storage medium |
CN111339360A (en) * | 2020-02-24 | 2020-06-26 | 北京奇艺世纪科技有限公司 | Video processing method and device, electronic equipment and computer readable storage medium |
CN111339360B (en) * | 2020-02-24 | 2024-03-26 | 北京奇艺世纪科技有限公司 | Video processing method, video processing device, electronic equipment and computer readable storage medium |
CN111414868A (en) * | 2020-03-24 | 2020-07-14 | 北京旷视科技有限公司 | Method for determining time sequence action fragment, action detection method and device |
CN111414868B (en) * | 2020-03-24 | 2023-05-16 | 北京旷视科技有限公司 | Method for determining time sequence action segment, method and device for detecting action |
CN111522996A (en) * | 2020-04-09 | 2020-08-11 | 北京百度网讯科技有限公司 | Video clip retrieval method and device |
CN111522996B (en) * | 2020-04-09 | 2023-09-08 | 北京百度网讯科技有限公司 | Video clip retrieval method and device |
US11625433B2 (en) | 2020-04-09 | 2023-04-11 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for searching video segment, device, and medium |
CN111639599B (en) * | 2020-05-29 | 2024-04-02 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN111639599A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN111860185A (en) * | 2020-06-23 | 2020-10-30 | 北京无限创意信息技术有限公司 | Shot boundary detection method and system |
CN111914926A (en) * | 2020-07-29 | 2020-11-10 | 深圳神目信息技术有限公司 | Sliding window-based video plagiarism detection method, device, equipment and medium |
CN111914926B (en) * | 2020-07-29 | 2023-11-21 | 深圳神目信息技术有限公司 | Sliding window-based video plagiarism detection method, device, equipment and medium |
CN112149575A (en) * | 2020-09-24 | 2020-12-29 | 新华智云科技有限公司 | Method for automatically screening automobile part fragments from video |
CN112380929A (en) * | 2020-10-30 | 2021-02-19 | 北京字节跳动网络技术有限公司 | Highlight segment obtaining method and device, electronic equipment and storage medium |
CN112491999A (en) * | 2020-11-18 | 2021-03-12 | 成都佳华物链云科技有限公司 | Data reporting method and device |
CN112749625B (en) * | 2020-12-10 | 2023-12-15 | 深圳市优必选科技股份有限公司 | Time sequence behavior detection method, time sequence behavior detection device and terminal equipment |
CN112749625A (en) * | 2020-12-10 | 2021-05-04 | 深圳市优必选科技股份有限公司 | Time sequence behavior detection method, time sequence behavior detection device and terminal equipment |
CN112883782A (en) * | 2021-01-12 | 2021-06-01 | 上海肯汀通讯科技有限公司 | Method, device, equipment and storage medium for identifying putting behaviors |
CN112883782B (en) * | 2021-01-12 | 2023-03-24 | 上海肯汀通讯科技有限公司 | Method, device, equipment and storage medium for identifying putting behaviors |
CN112784095A (en) * | 2021-01-18 | 2021-05-11 | 北京洛塔信息技术有限公司 | Difficult sample data mining method, device, equipment and storage medium |
CN114827757A (en) * | 2021-01-29 | 2022-07-29 | 深圳市万普拉斯科技有限公司 | Video frame selection method, video time-shrinking processing method and device and computer equipment |
CN113762040A (en) * | 2021-04-29 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Video identification method and device, storage medium and computer equipment |
CN113191266A (en) * | 2021-04-30 | 2021-07-30 | 江苏航运职业技术学院 | Remote monitoring management method and system for ship power device |
CN113191266B (en) * | 2021-04-30 | 2021-10-22 | 江苏航运职业技术学院 | Remote monitoring management method and system for ship power device |
WO2023273628A1 (en) * | 2021-06-30 | 2023-01-05 | 腾讯科技(深圳)有限公司 | Video loop recognition method and apparatus, computer device, and storage medium |
CN113449824A (en) * | 2021-09-01 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Video processing method, device and computer readable storage medium |
CN114760534B (en) * | 2022-03-28 | 2024-03-01 | 北京捷通华声科技股份有限公司 | Video generation method, device, electronic equipment and readable storage medium |
CN114760534A (en) * | 2022-03-28 | 2022-07-15 | 北京捷通华声科技股份有限公司 | Video generation method and device, electronic equipment and readable storage medium |
CN114842239A (en) * | 2022-04-02 | 2022-08-02 | 北京医准智能科技有限公司 | Breast lesion attribute prediction method and device based on ultrasonic video |
CN117135444A (en) * | 2023-03-10 | 2023-11-28 | 荣耀终端有限公司 | Frame selection decision method and device based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN109977262B (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977262A (en) | The method, apparatus and processing equipment of candidate segment are obtained from video | |
Kristan et al. | The visual object tracking vot2015 challenge results | |
Lai et al. | Sparse distance learning for object recognition combining rgb and depth information | |
Kantorov et al. | Efficient feature extraction, encoding and classification for action recognition | |
Shen et al. | Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection | |
Zhao et al. | Learning mid-level filters for person re-identification | |
Bregonzio et al. | Fusing appearance and distribution information of interest points for action recognition | |
JP5604256B2 (en) | Human motion detection device and program thereof | |
Murthy et al. | Ordered trajectories for large scale human action recognition | |
GB2516037A (en) | Compact and robust signature for large scale visual search, retrieval and classification | |
JP6897749B2 (en) | Learning methods, learning systems, and learning programs | |
CN109948497A (en) | A kind of object detecting method, device and electronic equipment | |
CN110263712A (en) | A kind of coarse-fine pedestrian detection method based on region candidate | |
Duta et al. | Histograms of motion gradients for real-time video classification | |
Bilinski et al. | Evaluation of local descriptors for action recognition in videos | |
Liu et al. | Subtler mixed attention network on fine-grained image classification | |
CN109753884A (en) | A kind of video behavior recognition methods based on key-frame extraction | |
Luo et al. | SFA: small faces attention face detector | |
CN114399644A (en) | Target detection method and device based on small sample | |
Kang et al. | Robust visual tracking via nonlocal regularized multi-view sparse representation | |
Russakovsky et al. | A steiner tree approach to efficient object detection | |
WO2023048809A1 (en) | Leveraging unsupervised meta-learning to boost few-shot action recognition | |
Roy et al. | Foreground segmentation using adaptive 3 phase background model | |
CN102609715B (en) | Object type identification method combining plurality of interest point testers | |
Bai et al. | Multi-scale fully convolutional network for face detection in the wild |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |