CN109977262A

CN109977262A - The method, apparatus and processing equipment of candidate segment are obtained from video

Info

Publication number: CN109977262A
Application number: CN201910231596.9A
Authority: CN
Inventors: 卢江虎; 姚聪; 刘小龙; 孙宇超
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2019-07-05
Anticipated expiration: 2039-03-25
Also published as: CN109977262B

Abstract

The present invention provides a kind of from video obtains the method, apparatus and processing equipment of candidate segment, is related to motion detection technical field, this method, comprising: obtain video to be detected；The image similarity between video adjacent video frames to be detected is calculated separately by preset similarity algorithm, obtains similarity sequence；Wherein, the sequence of the image similarity in similarity sequence and the sequence of video frame are identical；The image similarity of the first segmentation threshold will be greater than in similarity sequence as target image similarity；If multiple target image similarities putting in order continuously, using the corresponding video frame of multiple target image similarities as the candidate segment of video to be detected in similarity sequence.The method, apparatus and processing equipment provided in an embodiment of the present invention that candidate segment is obtained from video can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for various video actions detection models.

Description

The method, apparatus and processing equipment of candidate segment are obtained from video

Technical field

The present invention relates to motion detection technical field, more particularly, to a kind of method that candidate segment is obtained from video, Device and processing equipment.

Background technique

Video actions detection refers in detection target video with the presence or absence of specific target action, if there is spy in video Fixed target action, it is also necessary to determine the initial time and terminate the time that target action occurs.With the outburst of number of videos Formula increases, and video actions detection is used in more and more extensive field, including pedestrian's supervision, automatic Pilot, short Video segmentation Deng.

Since the duration difference of different movements is larger and it is many kinds of to act, the effect of video actions detection is not It is ideal.Existing major video motion detection method is then first output may train a classification net comprising the segment of movement Network classifies to above-mentioned segment, however has the following problems: if the background of video and prospect similarity are higher, causing to extract The discrimination of feature is not strong, therefore the position inaccurate of operation limit；Sorter network general-purpose capability is poor, usually in certain number It is fitted by force according on collection, it is poor to the nicety of grading of other data sets, it needs to readjust parameter.

For the above problem of the detection of video actions in the prior art, currently no effective solution has been proposed.

Summary of the invention

In view of this, the purpose of the present invention is to provide it is a kind of from video obtain candidate segment method, apparatus and from Equipment is managed, there can be good robustness with the more accurate candidate segment of output, be suitable for various video actions detection models.

In a first aspect, the method that the embodiment of the invention provides a kind of from video obtains candidate segment, comprising: obtain to Detect video；It is similar that the image between the video adjacent video frames to be detected is calculated separately by preset similarity algorithm Degree, obtains similarity sequence；Wherein, the sequence and the sequence phase of the video frame of the image similarity in the similarity sequence Together；The described image similarity of the first segmentation threshold will be greater than in the similarity sequence as target image similarity；If Multiple target image similarities putting in order continuously, by the multiple target image similarity pair in the similarity sequence Candidate segment of the video frame answered as the video to be detected.

Further, described using the corresponding video frame of the multiple target image similarity as the time of the video to be detected The step of selected episode, comprising: using corresponding first video frame of the multiple target image similarity rising as candidate segment Beginning frame, using last corresponding video frame of the multiple target image similarity as the end frame of candidate segment；From described It is partitioned into the start frame in video to be detected to the segment for terminating interframe, obtains candidate segment.

Further, the described image similarity in the similarity sequence is with index mark；If multiple targets Image similarity putting in order continuously, by the corresponding video of the multiple target image similarity in the similarity sequence The step of candidate segment of the frame as the video to be detected, comprising: judge the index mark of adjacent described image similarity Whether continuous know；If so, the continuous index of judgement identifies whether to be greater than preset quantity threshold value；

If it is greater than the preset quantity threshold value, the continuous index is identified into corresponding video frame as described to be checked Survey the candidate segment of video.

Further, after obtaining the candidate segment, the method also includes: the candidate segment is corresponding similar Greater than the image similarity of the second segmentation threshold as subdivision image similarity in degree series；Second segmentation threshold is greater than institute State the first segmentation threshold；If multiple subdivision image similarities putting in order continuously in the similarity sequence, will be described Multiple candidate segments segmenting the corresponding video frame of image similarity and being segmented as the first kind of the candidate segment；By the time The candidate segment segmented by other segments that the subdivision candidate segment is partitioned into as the second class in selected episode.

Further, described using the corresponding video frame of the multiple subdivision image similarity as the first of the candidate segment The step of candidate segment of class subdivision, comprising: using corresponding first video frame of the multiple subdivision image similarity as thin The start frame for dividing candidate segment, using last corresponding video frame of the multiple subdivision image similarity as the candidate piece of subdivision The end frame of section, divides the candidate segment and obtains the subdivision candidate segment.

Further, after obtaining the subdivision candidate segment, the method also includes: in the adjacent candidate segment The candidate segment of a subdivision is selected respectively；Using first video frame of the preceding subdivision candidate segment as lengthening Candidate segment start frame, using it is posterior it is described subdivision candidate segment last video frame as lengthening candidate segment End frame, divide the candidate segment that the video to be detected is lengthened.

Further, the method also includes the degree of overlapping based on two candidate segments and correct mark segment, settings Sort loss function；Two candidate segments are different from the correct mark degree of overlapping of segment；Letter is lost into the sequence Loss function of the number as video actions detection model, and the video actions detection model is carried out by the candidate segment Training.

Further, the method also includes: by preconfigured video actions detection model to the candidate segment into Row motion detection.

Second aspect, the embodiment of the invention provides a kind of from video obtains the device of candidate segment, comprising: obtains mould Block, for obtaining video to be detected；Computing module, for calculating separately the video to be detected by preset similarity algorithm Image similarity between adjacent video frames, obtains similarity sequence；Wherein, image similarity in the similarity sequence It sorts identical as the sequence of the video frame；Searching module, for the first segmentation threshold will to be greater than in the similarity sequence Described image similarity is as target image similarity；Divide module, if for multiple target image similarities in the phase Like putting in order continuously, using the corresponding video frame of the multiple target image similarity as the view to be detected in degree series The candidate segment of frequency.

The third aspect the embodiment of the invention provides a kind of processing equipment, including memory, processor and is stored in described On memory and the computer program that can run on the processor, the processor are realized when executing the computer program The step of any one of above-mentioned first aspect the method.

Fourth aspect, the embodiment of the invention provides a kind of meters of non-volatile program code that can be performed with processor The step of calculation machine readable medium, said program code makes the processor execute any one of above-mentioned first aspect the method.

The method, apparatus and processing equipment provided in an embodiment of the present invention that candidate segment is obtained from video, by default Similarity algorithm calculate separately the image similarity between video adjacent video frames to be detected, obtain similarity sequence, the phase It is identical like the sequence of the image similarity in degree series and the sequence of video frame, then the first segmentation threshold will be greater than in similarity sequence It is worth and continuous candidate segment of the corresponding video frame of image similarity as video to be detected, the above method passes through adjacent video Image similarity and segmentation strategy between frame can have good Shandong with the more accurate candidate segment of output, the candidate segment Stick and be suitable for various video actions detection models.

Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.

To enable the above objects, features, and advantages of the disclosure to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of structural schematic diagram of processing equipment provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of method that candidate segment is obtained from video provided in an embodiment of the present invention；

Fig. 3 is the process schematic provided in an embodiment of the present invention using Ranking Loss training pattern；

Fig. 4 is the process schematic provided in an embodiment of the present invention that candidate segment is generated using SSIM sequence；

Fig. 5 is the verification result of video actions detection model provided in an embodiment of the present invention；

Fig. 6 is a kind of structural block diagram of device that candidate segment is obtained from video provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

In existing video actions detection method, output may have the following problems comprising the process of the segment of movement: 1. The operation limit position inaccurate of section；2. generalization ability is poor, the segment being fitted by force can not be suitable for other data sets.Base In this, the embodiment of the invention provides a kind of from video obtains the method, apparatus and processing equipment of candidate segment, below by way of The embodiment of the present invention describes in detail.

Embodiment one:

Firstly, describing the processing equipment 100 for realizing the embodiment of the present invention referring to Fig.1, which can be used In the method for operation various embodiments of the present invention.

As shown in Figure 1, processing equipment 100 includes one or more processors 102, one or more memories 104, input Device 106, output device 108 and data collector 110, the company that these components pass through bus system 112 and/or other forms The interconnection of connection mechanism (not shown).It should be noted that the component and structure of processing equipment 100 shown in FIG. 1 are only exemplary, rather than Restrictive, as needed, the processing equipment also can have other assemblies and structure.

The processor 102 can use digital signal processor (DSP), field programmable gate array (FPGA), can compile At least one of journey logic array (PLA) and ASIC (Application Specific Integrated Circuit) are hard Part form realizes that the processor 102 can be central processing unit (CPU) or have data-handling capacity and/or refer to The processing unit of the other forms of executive capability is enabled, and can control other components in the processing equipment 100 to execute Desired function.

The memory 104 may include one or more computer program products, and the computer program product can be with Including various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described volatile Property memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-easy The property lost memory for example may include read-only memory (ROM), hard disk, flash memory etc..On the computer readable storage medium It can store one or more computer program instructions, processor 102 can run described program instruction, described below to realize The embodiment of the present invention in the client functionality (realized by processor) and/or other desired functions.In the calculating Various application programs and various data can also be stored in machine readable storage medium storing program for executing, such as the application program is used and/or produced Raw various data etc..

The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..

The data collector 110 is for carrying out data acquisition, wherein data collector data collected are current mesh The initial data or target data can also be stored in described by target initial data or target data, then, data collector For the use of other components in memory 104.

Illustratively, the processing for obtaining the method for candidate segment from video for realizing according to an embodiment of the present invention is set It is standby to may be implemented as the intelligent terminals such as server, smart phone, tablet computer, computer.

Embodiment two:

A kind of method that the embodiment of the invention provides image processing methods to obtain candidate segment from video, referring to fig. 2 Shown in it is a kind of from video obtain candidate segment method flow chart, this method can by previous embodiment provide processing set Standby to execute, this method may include steps of:

Step S202 obtains video to be detected.

The method provided in this embodiment that candidate segment is obtained from video, it is therefore an objective to video extraction to be detected be obtained more A candidate segment (proposals) can carry out further motion detection to video based on above-mentioned candidate segment.

Step S204 calculates separately the image phase between video adjacent video frames to be detected by preset similarity algorithm Like degree, similarity sequence is obtained.Wherein, the sequence of the image similarity in similarity sequence and the sequence of video frame are identical.

The preset similarity algorithm can be used for measuring the similarity degree of two images, in the present embodiment i.e. for weighing The image similarity for measuring adjacent two field pictures in video, can determine in adjacent two field pictures whether wrap by the image similarity Containing continuous action, to carry out subsequent Video segmentation accordingly.Above-mentioned similarity algorithm can for example use mean square error MSE (mean-square error), structural similarity SSIM (structural similarity index) or Y-PSNR PSNR (Peak Signal to Noise Ratio) scheduling algorithm.All adjacent two field pictures of video to be detected are being calculated Between image similarity after, the sequence that all similarities are occurred in video according to its corresponding image arranges, and can obtain To similarity sequence.The sequence phase of the corresponding video frame of the sequence of image similarity in finally obtained similarity sequence Together.

Step S206, the image similarity that the first segmentation threshold is greater than in similarity sequence is similar as target image Degree.

Wherein, indicate to include continuous movement in adjacent two field pictures greater than the similarity of the first segmentation threshold, being less than should The similarity of first segmentation threshold then indicate do not include in adjacent two field pictures continuously act, therefore according to above-mentioned similarity with The comparison result of first segmentation threshold can find out some start image of movement in video and terminate image.

Step S208, if multiple target image similarities putting in order continuously, by multiple mesh in similarity sequence Candidate segment of the corresponding video frame of logo image similarity as video to be detected.

Since above-mentioned image similarity is the image similarity degree indicated between adjacent video frames, multiple target images are similar Degree putting in order in similarity sequence is continuous, that is, can determine in the corresponding video frame of multiple target image similarity and include There is continuous movement, needs to split its corresponding video frame, to obtain the candidate segment of video to be detected.

The method provided in an embodiment of the present invention that candidate segment is obtained from video, is distinguished by preset similarity algorithm The image similarity between video adjacent video frames to be detected is calculated, similarity sequence is obtained, the image in the similarity sequence The sequence of similarity and the sequence of video frame are identical, then will be greater than the first segmentation threshold and continuous image phase in similarity sequence Like candidate segment of the corresponding video frame as video to be detected is spent, the above method is similar by the image between adjacent video frames Degree and segmentation strategy can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for various Video actions detection model.

After obtaining similarity sequence, continuous multiple images similarity can be therefrom selected, and by its corresponding video Fragment segmentation comes out, i.e., above-mentioned candidate segment, wherein using the corresponding video frame of multiple target image similarities as view to be detected The candidate segment of frequency can execute in the following way: using corresponding first video frame of multiple target image similarities as The start frame of candidate segment, using last corresponding video frame of multiple target image similarities as the end of candidate segment Frame is partitioned into start frame to the segment for terminating interframe from video to be detected, obtains candidate segment.Due to above-mentioned image similarity Refer to the similarity between consecutive frame image, therefore each image similarity corresponds to two images, therefore aforementioned start frame refers to Previous image in corresponding two images of target image similarity, aforementioned end frame feeling the pulse with the finger-tip logo image similarity are two corresponding Latter image in image.

Continuous image similarity is obtained for the ease of screening from similarity sequence, can be the figure in similarity sequence As similarity setting index mark, the sequence of index mark is also identical as the sequence of video frame, such as can be frame image Serial number.When determining the candidate segment of video to be detected, it can be determined that the index of adjacent image similarity identify whether continuously, with For serial number, then may determine that whether the difference of the serial number of adjacent image similarity is 1.If index mark be it is continuous, after The continuous continuous index of judgement identifies whether to be greater than preset quantity threshold value, it is therefore an objective to exclude the very few segment of continuous quantity and examine to movement The adverse effect of survey.If it is greater than preset quantity threshold value, the corresponding video frame of mark will be continuously indexed as video to be detected Candidate segment.

In order to obtain position the more accurate candidate segment in boundary, aforementioned obtained each candidate segment can also be continued into Row segmentation, the segmentation of output more details.Therefore the above method can also include:

(1) image similarity of the second segmentation threshold will be greater than in the corresponding similarity sequence of candidate segment as subdivision graph As similarity, wherein the second segmentation threshold is greater than the first segmentation threshold；

(2) if multiple subdivision image similarity putting in order continuously, by multiple subdivision images in similarity sequence The candidate segment that the corresponding video frame of similarity is segmented as the first kind of candidate segment.It is similar with previous segmentation process, it can be with Using corresponding first video frame of multiple subdivision image similarities as the start frame of subdivision candidate segment, by multiple subdivision images End frame of last the corresponding video frame of similarity as subdivision candidate segment, dividing candidate segment obtain segmenting candidate piece Section.It is final to improve so as to be partitioned into finer candidate segment in original candidate segment by improving segmentation threshold The precision of motion detection.

(3) the candidate piece segmented other segments that candidate segment is partitioned into are subdivided in candidate segment as the second class Section.It is divided part in abovementioned steps (2), in original candidate segment as finer candidate segment, original Candidate segment in further include at least one other remaining segment, using other segments also as finer candidate piece Section.

For the candidate segment of output different length, aforementioned obtained each candidate segment can also be continued again Combination, the different segmentation of output length.The above method can also include:

(1) candidate segment of a subdivision is selected respectively in neighboring candidate segment.In two adjacent candidate segments, respectively From the candidate segment of one subdivision of selection, position of the candidate segment of the subdivision in corresponding candidate segment is unlimited.

It (2), will using first video frame of the candidate segment of preceding subdivision as the start frame of the candidate segment lengthened End frame of last video frame of the candidate segment of posterior subdivision as the candidate segment lengthened, divides video to be detected The candidate segment lengthened.When connecting together the candidate segment that two are segmented, the video frame among the two is also wrapped Include the candidate segment only lengthened.Position based on the candidate segment of subdivision in corresponding candidate segment is different, can obtain To the candidate segment of different lengths, to enrich the sample size for being trained or detecting.

Above-mentioned candidate segment is being obtained, video actions detection model can be trained or by preconfigured view Frequency motion detection model carries out motion detection to candidate segment.In the training process, in order to improve the precision of model, time has been included in it Order information of the selected episode in video to be detected, to be different from the different candidate segment of realistic operation segment degree of overlapping.Base In above-mentioned thought, the above method is further comprising the steps of:

(1) degree of overlapping based on two candidate segments and correct mark segment, setting sequence loss function；Two candidate pieces Section is different from the correct mark degree of overlapping of segment.(2) will sequence loss function as the loss function of video actions detection model, And video actions detection model is trained by candidate segment.

Most of existing method is all to obtain video actions detection model using intersection entropy loss training deep learning model, Then classify to candidate segment, have ignored the relation information between candidate segment.For two candidate segments, due to depth The precision reason for spending learning model, causes their score relatively high.If candidate segment is added when training to regard Order information in frequency thus may be used so that the score of good candidate segment really can be more much higher than the candidate segment of difference So that the precision of model greatly improves.Sequence damage can be added on the basis of intersecting entropy loss when training pattern It loses function (Ranking Loss).Assuming that two candidate segments are overlapping with the movement segment (ground-truth) of correct labeling Degree is respectively c_p,c_q, without loss of generality, it is assumed that c_p>c_q, then it is as follows that sequence loss function can be set when training:

l_rank=max (0, c_q-c_p+ε)

Process schematic using Ranking Loss training pattern shown in Figure 3, wherein ψ₁、ψ₂、ψ₃It is 3 respectively A different candidate segment, C₁、C₂、C₃Respectively indicate ψ₁、ψ₂、ψ₃With the degree of overlapping of correct mark segment, i.e. model training process Middle candidate segment ψ₁、ψ₂、ψ₃Corresponding score., the target of model training is if C₁、C₂、C₃It is ranked up two-by-two, it is corresponding Sequence be C₁> C₂> C₃。

Following embodiment is illustrated so that SSIM is split video as an example.There is very strong association between adjacent picture Property, formula are as follows:

Wherein x and y indicates two images, μ_xAnd μ_yIt is its average value, σ_xAnd σ_yIt is its standard deviation, σ_xyIt is two pictures Covariance, C₁And C₂For constant.SSIM compares brightness, contrast and the structural similarity of two pictures.Use SSIM similarity Sequence, can be specific as follows by segmentation strategy and convergence strategy output candidate segment abundant:

(1) segmentation threshold θ output binary set segmentation strategy: is used to SSIM sequence S.It will be less than or equal to segmentation threshold Similarity is set as vector 1, and the similarity that will be greater than segmentation threshold is set as vector 0, and wherein vector 1 indicates the side of candidate segment Boundary, 0 indicates the inside of candidate segment.

To the binary set of similarity collect promising vector 1 index, obtain B={ i, x_i≠ 0 }, wherein x_iFrom B (S,θ)。

(2) convergence strategy: the index that above-mentioned vector is 1 is connected, the candidate segment of video is obtained Wherein x_iIt is degree of communication from B, δ, T is the length of B.

Use the initial candidate segment Φ of above-mentioned segmentation strategy and the available video of convergence strategy_ini.In order to more accurate Boundary is positioned, to Φ_iniIn each segment continue to do segmentation strategy and convergence strategy, the candidate segment of available more details Φ_det.For the candidate segment of output different length, can be indexed according to all boundaries in two neighboring candidate segments, output Longer segment Φ_com, final candidate segment of all candidate segments as a video can be finally collected, as follows:

Φ_V=Φ_ini∪Φ_det∪Φ_com

The shown in Figure 4 process schematic that candidate segment is generated using SSIM sequence, wherein initial candidate segment example Such as x₁ ⁰-x₂ ⁰、x₃ ⁰-x₄ ⁰, obtained subdivision candidate segment such as x₃ ⁰-x₁ ¹、x₁ ¹-x₄ ⁰, obtained lengthening candidate segment such as x₂ ⁰- x₃ ⁰-x₁ ¹、x₃ ⁰-x₁ ¹-x₄ ⁰。

In conjunction with the video actions detection model that SSIM sequence and Ranking Loss above-mentioned, training obtain, verification result As shown in figure 5, the result of its output will be far better than existing method, the first row is correct labeling in a figure and b figure of Fig. 5 Both candidate segment, the second row are preferable candidate segments, and the third line is poor candidate segment, as can be seen from Figure 5 Score has very big gap, and Ranking Loss successfully inhibits poor candidate segment.

Embodiment three:

For image processing method provided in embodiment two, the embodiment of the invention provides one kind to obtain from video The device of candidate segment, a kind of structural block diagram of device obtaining candidate segment from video shown in Figure 6, comprising:

Module 602 is obtained, for obtaining video to be detected；

Computing module 604, for being calculated separately between video adjacent video frames to be detected by preset similarity algorithm Image similarity, obtain similarity sequence；Wherein, the sequence of the sequence of the image similarity in similarity sequence and video frame It is identical；

Searching module 606, for the image similarity of the first segmentation threshold will to be greater than in similarity sequence as target figure As similarity；

Divide module 608, it, will if the putting in order continuously in similarity sequence for multiple target image similarities Candidate segment of the corresponding video frame of multiple target image similarities as video to be detected.

The device provided in an embodiment of the present invention that candidate segment is obtained from video, passes through the image between adjacent video frames Similarity and segmentation strategy can have good robustness with the more accurate candidate segment of output, the candidate segment and be suitable for Various video actions detection models.

In one embodiment, segmentation module is also used to: by corresponding first video of multiple target image similarities Start frame of the frame as candidate segment, using last corresponding video frame of multiple target image similarities as candidate segment End frame；It is partitioned into start frame from video to be detected to the segment for terminating interframe, obtains candidate segment.

In another embodiment, the image similarity in similarity sequence is with index mark；Segmentation module is also used In: judge that the index of adjacent image similarity identifies whether continuously；If so, the continuous index of judgement is identified whether to be greater than and be preset Amount threshold；If it is greater than preset quantity threshold value, continuous index is identified into corresponding video frame as the time of video to be detected Selected episode.

In one embodiment, above-mentioned apparatus further includes subdivision module, is used for: by the corresponding similarity sequence of candidate segment Greater than the image similarity of the second segmentation threshold as subdivision image similarity in column；Second segmentation threshold is greater than the first segmentation threshold Value；If multiple subdivision image similarities putting in order continuously, by multiple subdivision image similarities pair in similarity sequence The candidate segment that the video frame answered is segmented as the first kind of candidate segment；It is partitioned into candidate segment is subdivided in candidate segment The candidate segment segmented as the second class of other segments.

In another embodiment, above-mentioned subdivision module is also used to: by multiple subdivision image similarities corresponding first Video frame as the start frame for segmenting candidate segment, using last corresponding video frame of multiple subdivision image similarities as The end frame of candidate segment is segmented, dividing candidate segment obtains subdivision candidate segment.

In one embodiment, above-mentioned apparatus further includes lengthening module, is used for: selecting respectively in neighboring candidate segment The candidate segment of one subdivision；Using first video frame of preceding subdivision candidate segment as the starting of the candidate segment lengthened Frame is divided to be detected using last video frame of posterior subdivision candidate segment as the end frame of the candidate segment lengthened The candidate segment that video is lengthened.

In one embodiment, above-mentioned apparatus further includes training module, is used for: based on two candidate segments and correct mark Infuse the degree of overlapping of segment, setting sequence loss function；Two candidate segments are different from the correct mark degree of overlapping of segment；It will sequence Loss function of the loss function as video actions detection model, and video actions detection model is instructed by candidate segment Practice.

In one embodiment, above-mentioned apparatus further includes detection module, is used for: being examined by preconfigured video actions It surveys model and motion detection is carried out to candidate segment.

The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

In addition, present embodiments provide a kind of processing equipment, which includes memory, processor and is stored in memory Computer program that is upper and can running on a processor, processor realize appearance provided by the above embodiment when executing computer program Gesture recognition methods.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description Specific work process, can be with reference to the corresponding process in previous embodiment, and details are not described herein.

Further, a kind of computer readable storage medium is present embodiments provided, is deposited on the computer readable storage medium The step of containing computer program, method provided by the above embodiment executed when computer program is run by processor.

A kind of meter for the method, apparatus and processing equipment obtaining candidate segment from video provided by the embodiment of the present invention Calculation machine program product, the computer readable storage medium including storing program code, the instruction that said program code includes can For executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, and details are not described herein.It is described If function is realized in the form of SFU software functional unit and when sold or used as an independent product, it can store and counted at one In calculation machine read/write memory medium.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out or the part of the technical solution can be embodied in the form of software products, the computer software product It is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, service Device or the network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage above-mentioned is situated between Matter include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various media that can store program code such as magnetic or disk.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of method for obtaining candidate segment from video characterized by comprising

Obtain video to be detected；

The image similarity between the video adjacent video frames to be detected is calculated separately by preset similarity algorithm, is obtained Similarity sequence；Wherein, the sequence of the image similarity in the similarity sequence is identical as the sequence of the video frame；

The described image similarity of the first segmentation threshold will be greater than in the similarity sequence as target image similarity；

If multiple target image similarities putting in order continuously, by the multiple target image in the similarity sequence Candidate segment of the corresponding video frame of similarity as the video to be detected.

2. the method according to claim 1, wherein described by the corresponding view of the multiple target image similarity The step of candidate segment of the frequency frame as the video to be detected, comprising:

It, will be the multiple using corresponding first video frame of the multiple target image similarity as the start frame of candidate segment End frame of last the corresponding video frame of target image similarity as candidate segment；

It is partitioned into the start frame from the video to be detected to the segment for terminating interframe, obtains candidate segment.

3. the method according to claim 1, wherein the described image similarity in the similarity sequence has Index mark；

If multiple target image similarities putting in order continuously, by the multiple target in the similarity sequence The step of candidate segment of the corresponding video frame of image similarity as the video to be detected, comprising:

Judge that the index of adjacent described image similarity identifies whether continuously；

If so, the continuous index of judgement identifies whether to be greater than preset quantity threshold value；

If it is greater than the preset quantity threshold value, the continuous index is identified into corresponding video frame as the view to be detected The candidate segment of frequency.

4. the method according to claim 1, wherein the method is also wrapped after obtaining the candidate segment It includes:

The image similarity of the second segmentation threshold will be greater than in the corresponding similarity sequence of the candidate segment as subdivision image Similarity；Second segmentation threshold is greater than first segmentation threshold；

If multiple subdivision image similarities putting in order continuously, by the multiple subdivision image in the similarity sequence The candidate segment that the corresponding video frame of similarity is segmented as the first kind of the candidate segment；

The candidate piece that will be segmented by other segments that the subdivision candidate segment is partitioned into as the second class in the candidate segment Section.

5. according to the method described in claim 4, it is characterized in that, described by the corresponding view of the multiple subdivision image similarity The step of candidate segment that frequency frame is segmented as the first kind of the candidate segment, comprising:

It, will be described using corresponding first video frame of the multiple subdivision image similarity as the start frame of subdivision candidate segment Multiple end frames for segmenting last corresponding video frame of image similarity as subdivision candidate segment divide the candidate piece Section obtains the subdivision candidate segment.

6. method according to claim 4 or 5, which is characterized in that after obtaining the subdivision candidate segment, the side Method further include:

Select the candidate segment of a subdivision respectively in the adjacent candidate segment；

It, will be posterior using first video frame of the preceding subdivision candidate segment as the start frame of the candidate segment lengthened End frame of last video frame of the subdivision candidate segment as the candidate segment lengthened, divides the video to be detected The candidate segment lengthened.

7. the method according to claim 1, wherein the method also includes:

Degree of overlapping based on two candidate segments and correct mark segment, setting sequence loss function；Two candidates Segment is different from the correct mark degree of overlapping of segment；

Using the sequence loss function as the loss function of video actions detection model, and by the candidate segment to described Video actions detection model is trained.

8. method according to claim 1 or claim 7, which is characterized in that the method also includes:

Motion detection is carried out to the candidate segment by preconfigured video actions detection model.

9. a kind of device for obtaining candidate segment from video characterized by comprising

Module is obtained, for obtaining video to be detected；

Computing module, for calculating separately the figure between the video adjacent video frames to be detected by preset similarity algorithm As similarity, similarity sequence is obtained；Wherein, the sequence of the image similarity in the similarity sequence and the video frame It sorts identical；

Searching module, for the described image similarity of the first segmentation threshold will to be greater than in the similarity sequence as target figure As similarity；

Divide module, if the putting in order continuously, by institute in the similarity sequence for multiple target image similarities State candidate segment of the corresponding video frame of multiple target image similarities as the video to be detected.

10. a kind of processing equipment, including memory, processor and it is stored on the memory and can transports on the processor Capable computer program, which is characterized in that the processor realizes the claims 1 to 8 when executing the computer program Any one of described in method the step of.

11. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium The step of being, the described in any item methods of the claims 1 to 8 executed when the computer program is run by processor.