CN105228033A - A kind of method for processing video frequency and electronic equipment - Google Patents

A kind of method for processing video frequency and electronic equipment Download PDF

Info

Publication number
CN105228033A
CN105228033A CN201510535580.9A CN201510535580A CN105228033A CN 105228033 A CN105228033 A CN 105228033A CN 201510535580 A CN201510535580 A CN 201510535580A CN 105228033 A CN105228033 A CN 105228033A
Authority
CN
China
Prior art keywords
feature
video
frame
face
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510535580.9A
Other languages
Chinese (zh)
Other versions
CN105228033B (en
Inventor
董培
靳玉茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201510535580.9A priority Critical patent/CN105228033B/en
Publication of CN105228033A publication Critical patent/CN105228033A/en
Application granted granted Critical
Publication of CN105228033B publication Critical patent/CN105228033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of method for processing video frequency and electronic equipment, described method comprises: from frame of video, extract fisrt feature collection, and described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature; Based on described fisrt feature collection, calculate second feature collection, described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band; Utilize the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtain video frequency abstract.

Description

A kind of method for processing video frequency and electronic equipment
Technical field
The present invention relates to video processing technique, particularly relate to a kind of method for processing video frequency and electronic equipment.
Background technology
Intelligent terminal, as smart mobile phone has become the carry-on companion of current people's Working Life, user is easy to accumulate a large amount of videos by the mode downloaded and take voluntarily.Especially for the mobile phone being equipped with binocular camera, need the data volume of storage larger.In the face of the mobile phone memory that capacity relative is limited, the management of video file is become to the problem needing solution badly.
Summary of the invention
For solving the problems of the technologies described above, embodiments provide a kind of method for processing video frequency and electronic equipment.
The method for processing video frequency that the embodiment of the present invention provides comprises:
From frame of video, extract fisrt feature collection, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
Based on described fisrt feature collection, calculate second feature collection, described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Utilize the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtain video frequency abstract.
The electronic equipment that the embodiment of the present invention provides comprises:
Extraction unit, for extracting fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
First processing unit, for based on described fisrt feature collection, calculates second feature collection, and described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Second processing unit, for utilizing the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtains video frequency abstract.
In the technical scheme of the embodiment of the present invention, from frame of video, extract colour moment feature, Wavelet Texture, motion feature, local key point feature; Then, based on extracted colour moment feature, Wavelet Texture, motion feature, local key point feature, motion attention feature is calculated, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band; To motion attention feature, carry out fusion treatment based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band, thus obtain video frequency abstract.So, semantic refining and important video-frequency band is relatively extracted from former video, thus effectively reduce in electronic equipment the data volume needing to preserve, improve utilization ratio and the Consumer's Experience of electronic device memory, be also conducive to user and from video file comparatively in a small amount, navigate to the video oneself wanting most to find in the future.Further, the technical scheme of the embodiment of the present invention combines the information from visual modalities (visualmodality) and word mode (textualmodality), more effectively can catch the high-level semantics of video content.In face attentiveness feature in conjunction with scene in the depth information of object, be conducive to grasping high-level semantics from more fully angle.The technical scheme of the embodiment of the present invention does not rely on the inspiration heuristic rule formulated for concrete video type, can be applicable to more wide in range video genre.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the method for processing video frequency of the embodiment of the present invention one;
Fig. 2 is the schematic flow sheet of the method for processing video frequency of the embodiment of the present invention two;
Fig. 3 is the overall flow figure that the video frequency abstract of the embodiment of the present invention extracts;
Fig. 4 is the flow chart of the semantic indicative character of the calculating video-frequency band of the embodiment of the present invention;
Fig. 5 is the structure composition schematic diagram of the electronic equipment of the embodiment of the present invention one;
Fig. 6 is the structure composition schematic diagram of the electronic equipment of the embodiment of the present invention two.
Embodiment
In order to feature and the technology contents of the embodiment of the present invention more at large can be understood, be described in detail below in conjunction with the realization of accompanying drawing to the embodiment of the present invention, the use of appended accompanying drawing explanation only for reference, be not used for limiting the embodiment of the present invention.
In the epoch of information explosion, traditional video data is browsed and has been faced unprecedented challenge with way to manage.Therefore, for video user provides brief and the video frequency abstract concentrating key message in former video has important practical significance.Video frequency abstract can be divided into dynamic and static state two type usually: dynamic video summary is the shortening version of former video, wherein can comprise a series of video-frequency band extracted from former long version; And the key frame that static video frequency abstract can extract from former video by a group is formed.
Traditional video frequency abstract is produced by the visual signature in extraction video or character features.But the method on this direction is adopt the rule or simple character analysis (as based on word frequency statistics) that inspire formula of groping mostly.In addition, the attention model method of traditional employing face characteristic only considers the information such as plan position approach and size of face in scene detected, lacks the use to depth information.
The technical scheme of the embodiment of the present invention is estimated by the relative importance of mode to video-frequency band of the heavy weighting of iteration based on the depth information of the attention model of user, the semantic information of video and frame of video, thus produces dynamic video summary.
Fig. 1 is the schematic flow sheet of the method for processing video frequency of the embodiment of the present invention one, and as shown in Figure 1, described method for processing video frequency comprises the following steps:
Step 101: extract fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature.
With reference to Fig. 3, first, extract fisrt feature collection from frame of video, fisrt feature integrates as low-level features collection, and fisrt feature collection comprises four low-level features: colour moment feature, Wavelet Texture, motion feature and local key point feature.
Below four low-level features that fisrt feature is concentrated are described in detail.
(1) colour moment feature
A frame of video is spatially divided into 5 × 5 (altogether 25) nonoverlapping block of pixels, for three passages of Lab color space calculate first moment and second order third central moment respectively in each block of pixels.Namely the colour moment of 25 block of pixels of this frame forms the colour moment characteristic vector f of this frame cm(i).
(2) Wavelet Texture
Similarly, a frame of video is divided into 3 × 3 (altogether 9) nonoverlapping block of pixels, three grades of Haar wavelet decomposition are carried out respectively to the luminance component of each piece, and then be the variance of every one-level calculating wavelet coefficient in level, vertical and diagonal.Namely all wavelet coefficient variances of this frame of video form the Wavelet Texture vector f of this frame wt(i).
(3) motion feature
The change of human eye to vision content has responsive discernment.Based on this general principle, a frame of video is divided into M × N number of non-overlapped block of pixels, each piece contains 16 × 16 pixels, and calculates motion vector v (i, m, n) by motion estimation algorithm.Namely M × N number of motion vector forms the motion feature f of this frame of video mv(i).
(4) local key point feature
In semantic class video analysis, the word bag (bagoffeatures is called for short BoF) based on local key point can supplement as the strong of the feature calculated by global information.Therefore, utilize the local key point feature of soft weighting to catch marking area, this feature has the importance in the vocabulary of 500 vision words based on key point at one and defines.Particularly, key point in i-th frame of video is by Gaussian difference (DifferenceofGaussians, be called for short DoG) detector acquisition, by Scale invariant features transform (Scale-InvariantFeatureTransform, being called for short SIFT) descriptor represents, and by cluster in 500 vision words.Key point characteristic vector f kpi () is defined as: the Weighted Similarity of the key point under four neighbours and vision word.
Step 102: based on described fisrt feature collection, calculate second feature collection, described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band.
Next, based on these low-level features, calculate senior vision and semantic feature further, be called second feature collection, comprise: motion attention feature, semantic indicative character based on the face attentiveness characteristic sum video-frequency band of depth information.
Next, based on above low-level features, be each any given video-frequency band χ further s(originate in i-th 1s () frame, ends at i-th 2(s) frame) calculate senior vision and semantic feature.Video segmentation is realized by shot cut detection.
Below each feature that second feature is concentrated is described in detail.
(1) motion attention feature
Indispensable basis has been established in the attentiveness modeling that psychological field is computer vision field to the research of human attention.The Cognition Mechanism of attentiveness, to very crucial in human thinking and movable analysis and understanding, thus can play directive function in the process selecting Composition of contents video frequency abstract relatively important in former video.This programme utilizes motion attention model to calculate the advanced motion attentiveness feature being suitable for semantic analysis.
For (the m in i-th frame of video, n) individual block of pixels, devise the time window that a spatial window comprising 5 × 5 (totally 25) block of pixels around and comprise 7 block of pixels, and these two windows are all centered by (m, n) block of pixels of the i-th frame.Will [0,2 π) phase range be on average divided into 8 intervals, in spatial window, count space phase histogram histogram time phase is counted in time window thus Space Consistency instruction C can be obtained according to following formula s(i, m, n) and time consistency instruction C t(i, m, n):
C s(i,m,n)=-∑ ζp s(ζ)logp s(ζ)(1a)
C t(i,m,n)=-∑ ζp t(ζ)logp t(ζ)(2a)
Wherein, p s ( ζ ) = H i , m , n ( s ) ( ζ ) / Σ ζ H i , m , n ( s ) ( ζ ) With p t ( ζ ) = H i , m , n ( t ) ( ζ ) / Σ ζ H i , m , n ( t ) ( ζ ) The PHASE DISTRIBUTION in spatial window and time window respectively.Next, the motion attention feature of the i-th frame is defined as foloows:
In order to suppress the noise in adjacent video frames feature, above the sequence of motion attention feature of gained will by the process of 9 rank median filters.To s video-frequency band χ s, its motion attention feature is obtained by filtered single frames feature exploitation:
f M ( s ) = 1 i 2 ( s ) - i 1 ( s ) + 1 Σ i = i 1 ( s ) i 2 ( s ) M O T ( i ) - - - ( 4 a )
(2) based on the face attentiveness feature of depth information
In video, the appearance of face may indicate the content of outbalance usually.This programme obtains the area A of face in each frame of video (carrying out index with alphabetical j) by Face datection algorithm f(j) and position.To the jth face detected, based on the depth image d corresponding with this frame of video iwith the pixel set { x|x ∈ Λ (j) } forming face, degree of depth conspicuousness D (j) be defined as follows:
D ( j ) = 1 | Λ ( j ) | Σ x ∈ Λ ( j ) d i ( x ) - - - ( 5 a )
Wherein | Λ (j) | be pixel number contained by a jth face.According to the position of face in whole frame of video, also define a position weight w fpj () is similar to the relative attention rate (region weight the closer to frame of video center is larger) that this face of reflection can obtain from spectators, as shown in table 1:
Table 1
The different face weights that in table 1 frame of video, zones of different is given.Central area weight is large, and fringe region weight is little.
The face attentiveness feature of the i-th frame may be calculated:
Wherein A frmfor the area of frame of video, D max(i)=max xd i(x).In order to reduce the impact of Face datection inaccuracy on this programme overall situation, gained face attentiveness characteristic sequence also will be smoothing by median filter (5 rank).Video-frequency band χ sface attentiveness feature through formula below by after level and smooth feature FAC (i) | i=i 1(s) ..., i 2(s) } calculate:
f F ( s ) = 1 i 2 ( s ) - i 1 ( s ) + 1 Σ i = i 1 ( s ) i 2 ( s ) F A C ( i ) - - - ( 7 a )
(3) the semantic indicative character of video-frequency band
With reference to Fig. 4, in order to excavate semantic information, this programme extracts the semantic indicative character of video-frequency band based on 374 concepts of VIREO-374 and three kinds of Support Vector Machine (SupportVectorMachine is called for short SVM) of each concept.Support Vector Machine is trained based on the colour moment introduced above, wavelet texture and local key point feature, can estimate the probable value of the degree in close relations between a given frame of video and concept in prediction.Calculate the flow process of the semantic indicative character of video-frequency band as shown in Figure 4:
For video-frequency band χ s, first extract its intermediate frame i mthe colour moment feature f of (s) cm(i m(s)), Wavelet Texture f wt(i m(s)) and local key point feature f kp(i m(s)), then obtain probable value { u by the prediction of Support Vector Machine cm(s, j), u wt(s, j), u kp(s, j) | j=1,2 ..., 374}, and then calculate concept and closely spend:
u ( s , j ) = u c m ( s , j ) + u w t ( s , j ) + u k p ( s , j ) 3 - - - ( 8 a )
Next, corresponding to video-frequency band caption information processes.Based on the set Γ that captions vocabulary is formed stthe set Γ of (s) and concept vocabulary cpj (), by the similarity measurement instrument WordNet::Similarity of outside dictionary WordNet, calculates word semantic similarity:
κ ( s , j ) = max γ ∈ Γ s t ( s ) 1 | Γ c p ( j ) | Σ ω ∈ Γ c p ( j ) η ( γ , ω ) - - - ( 9 a )
Wherein η (γ, ω) represents captions vocabulary γ and the similarity value of concept vocabulary ω in WordNet::Similarity.
In order to reduce the impact of uncorrelated concept, define following word degree of correlation:
ρ ( s , j ) = 1 Q κ ( s , j ) , u ( s , j ) ∈ ( 0.5 , 1 ] 0 , u ( s , j ) ∈ [ 0 , 0.5 ] - - - ( 10 a )
Wherein Q ensures the normalization coefficient set up.What provide due to Support Vector Machine is the probability of two class classification problems, naturally adopts threshold value 0.5 above in formula.
Finally, the semantic indicative character f of video-frequency band es () is defined as the weighted sum that ρ (s, j) is weight with u (s, j):
f E ( s ) = Σ j = 1 374 ρ ( s , j ) u ( s , j ) - - - ( 11 a )
Step 103: utilize the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtain video frequency abstract.
Finally, utilize the linear model of the heavy weighting of an iteration to merge three kinds of advanced features, produce the video frequency abstract of user's Len req.
In the embodiment of the present invention, video frequency abstract finally determines by the conspicuousness score value of each video-frequency band, and thus adopt following linear model to merge three kinds of advanced features, fusion results is the conspicuousness score value of video-frequency band:
f SAL(s)=w M(s)f M(s)+w F(s)f F(s)+w E(s)f E(s)(12a)
Wherein w m(s), w f(s) and w es () is the weight of feature.Before linear fusion, each feature is normalized to interval [0,1] all respectively.
Method below by the heavy weighting of a kind of iteration calculates feature weight.In kth time iteration, weight w #(s) (# ∈ M, F, E}) by following macroscopical factor-alpha #(s) and microcosmic factor-beta #product (the i.e. w of (s) #(s)=α #(s) β #(s)) determine:
α # ( s ) = 1 - r # ( s ) N S - - - ( 13 a )
β # ( k ) ( s ) = 1 + f # ( s ( k ) ) - f # ( s ′ ( k - 1 ) ) f # ( s ( k ) ) + f # ( s ′ ( k - 1 ) ) - - - ( 14 a )
Wherein r #s () is feature f #s () is at { f #(s) | s=1,2 ..., N srank after descending, N sit is the sum of video-frequency band in video.Next, the conspicuousness f of video-frequency band can be calculated sAL(s) by its sequence descending.According to user's Len req, according to f sALs () is from high to low by video-frequency band selected video frequency abstract one by one.
Before iterative process starts first, according to the principle of equal weight, initialization is carried out to feature weight.Iterative process terminates through 15 times.
The technical scheme of the embodiment of the present invention, first extracts the low-level features such as colour moment, wavelet texture, motion and local key point from frame of video.Next, based on these low-level features, calculate senior vision and semantic feature further, comprise motion attention feature, consider the semantic indicative character of the face attentiveness characteristic sum video-frequency band of depth information.Then, utilize the linear model of the heavy weighting of an iteration to merge three kinds of advanced features, produce the video frequency abstract of user's Len req.
Fig. 2 is the schematic flow sheet of the method for processing video frequency of the embodiment of the present invention two, and as shown in Figure 2, described method for processing video frequency comprises the following steps:
Step 201: extract fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature.
With reference to Fig. 3, first, extract fisrt feature collection from frame of video, fisrt feature integrates as low-level features collection, and fisrt feature collection comprises four low-level features: colour moment feature, Wavelet Texture, motion feature and local key point feature.
Below four low-level features that fisrt feature is concentrated are described in detail.
(1) colour moment feature
A frame of video is spatially divided into 5 × 5 (altogether 25) nonoverlapping block of pixels, for three passages of Lab color space calculate first moment and second order third central moment respectively in each block of pixels.Namely the colour moment of 25 block of pixels of this frame forms the colour moment characteristic vector f of this frame cm(i).
(2) Wavelet Texture
Similarly, a frame of video is divided into 3 × 3 (altogether 9) nonoverlapping block of pixels, three grades of Haar wavelet decomposition are carried out respectively to the luminance component of each piece, and then be the variance of every one-level calculating wavelet coefficient in level, vertical and diagonal.Namely all wavelet coefficient variances of this frame of video form the Wavelet Texture vector f of this frame wt(i).
(3) motion feature
The change of human eye to vision content has responsive discernment.Based on this general principle, a frame of video is divided into M × N number of non-overlapped block of pixels, each piece contains 16 × 16 pixels, and calculates motion vector v (i, m, n) by motion estimation algorithm.Namely M × N number of motion vector forms the motion feature f of this frame of video mv(i).
(4) local key point feature
In semantic class video analysis, the word bag (bagoffeatures is called for short BoF) based on local key point can supplement as the strong of the feature calculated by global information.Therefore, utilize the local key point feature of soft weighting to catch marking area, this feature has the importance in the vocabulary of 500 vision words based on key point at one and defines.Particularly, key point in i-th frame of video is by Gaussian difference (DifferenceofGaussians, be called for short DoG) detector acquisition, by Scale invariant features transform (Scale-InvariantFeatureTransform, being called for short SIFT) descriptor represents, and by cluster in 500 vision words.Key point characteristic vector f kpi () is defined as: the Weighted Similarity of the key point under four neighbours and vision word.
Step 202: the motion feature concentrated according to described fisrt feature, calculates motion attention feature.
Next, based on these low-level features, calculate senior vision and semantic feature further, be called second feature collection, comprise: motion attention feature, semantic indicative character based on the face attentiveness characteristic sum video-frequency band of depth information.
Next, based on above low-level features, be each any given video-frequency band χ further s(originate in i-th 1s () frame, ends at i-th 2(s) frame) calculate senior vision and semantic feature.Video segmentation is realized by shot cut detection.
Indispensable basis has been established in the attentiveness modeling that psychological field is computer vision field to the research of human attention.The Cognition Mechanism of attentiveness, to very crucial in human thinking and movable analysis and understanding, thus can play directive function in the process selecting Composition of contents video frequency abstract relatively important in former video.This programme utilizes motion attention model to calculate the advanced motion attentiveness feature being suitable for semantic analysis.
For (the m in i-th frame of video, n) individual block of pixels, devise the time window that a spatial window comprising 5 × 5 (totally 25) block of pixels around and comprise 7 block of pixels, and these two windows are all centered by (m, n) block of pixels of the i-th frame.Will [0,2 π) phase range be on average divided into 8 intervals, in spatial window, count space phase histogram histogram time phase is counted in time window thus Space Consistency instruction C can be obtained according to following formula s(i, m, n) and time consistency instruction C t(i, m, n):
C s(i,m,n)=-∑ ζp s(ζ)logp s(ζ)(1b)
C t(i,m,n)=-∑ ζp t(ζ)logp t(ζ)(2b)
Wherein, p s ( ζ ) = H i , m , n ( s ) ( ζ ) / Σ ζ H i , m , n ( s ) ( ζ ) With p t ( ζ ) = H i , m , n ( t ) ( ζ ) / Σ ζ H i , m , n ( t ) ( ζ ) The PHASE DISTRIBUTION in spatial window and time window respectively.Next, the motion attention feature of the i-th frame is defined as foloows:
In order to suppress the noise in adjacent video frames feature, above the sequence of motion attention feature of gained will by the process of 9 rank median filters.To s video-frequency band χ s, its motion attention feature is obtained by filtered single frames feature exploitation:
f M ( s ) = 1 i 2 ( s ) - i 1 ( s ) + 1 Σ i = i 1 ( s ) i 2 ( s ) M O T ( i ) - - - ( 4 b )
Step 203: the area and the position that are obtained face in each frame of video by Face datection algorithm, based on the depth image corresponding with this frame of video and the pixel set forming face, calculates the face attentiveness feature based on depth information.
In video, the appearance of face may indicate the content of outbalance usually.This programme obtains the area A of face in each frame of video (carrying out index with alphabetical j) by Face datection algorithm f(j) and position.To the jth face detected, based on the depth image d corresponding with this frame of video iwith the pixel set { x|x ∈ Λ (j) } forming face, degree of depth conspicuousness D (j) be defined as follows:
D ( j ) = 1 | Λ ( j ) | Σ x ∈ Λ ( j ) d i ( x ) - - - ( 5 b )
Wherein | Λ (j) | be pixel number contained by a jth face.According to the position of face in whole frame of video, also define position weight wfp (j) and be similar to the relative attention rate (region weight the closer to frame of video center is larger) reflecting that this face can obtain from spectators, as shown in table 1:
Table 1
The different face weights that in table 1 frame of video, zones of different is given.Central area weight is large, and fringe region weight is little.
The face attentiveness feature of the i-th frame may be calculated:
Wherein A frmfor the area of frame of video, D max(i)=max xd i(x).In order to reduce the impact of Face datection inaccuracy on this programme overall situation, gained face attentiveness characteristic sequence also will be smoothing by median filter (5 rank).Video-frequency band χ sface attentiveness feature through formula below by after level and smooth feature FAC (i) | i=i 1(s) ..., i 2(s) } calculate:
f F ( s ) = 1 i 2 ( s ) - i 1 ( s ) + 1 Σ i = i 1 ( s ) i 2 ( s ) F A C ( i ) - - - ( 7 b )
Step 204: described Support Vector Machine carries out the detection of semantic concept to described colour moment feature, Wavelet Texture, locally key point feature, obtains concept and closely spends.
In the embodiment of the present invention, based on described colour moment feature, Wavelet Texture and local key point feature, training Support Vector Machine.Support Vector Machine selects LibSVM to wrap, check colors moment characteristics and Wavelet Texture employing Radial basis kernel function (radialbasisfunction, be called for short RBF), and Chi side's core (Chi-squarekernel) is adopted to local key point feature.
With reference to Fig. 4, in order to excavate semantic information, this programme extracts the semantic indicative character of video-frequency band based on 374 concepts (semanticconcept) of VIREO-37 and three kinds of Support Vector Machine (SVM, SupportVectorMachine) of each concept.Support Vector Machine is trained based on the colour moment introduced above, wavelet texture and local key point feature, can estimate the probable value of the degree in close relations between a given frame of video and concept in prediction.Calculate the flow process of the semantic indicative character of video-frequency band as shown in Figure 4:
For video-frequency band χ s, first extract its intermediate frame i mthe colour moment feature f of (s) cm(i m(s)), Wavelet Texture f wt(i m(s)) and local key point feature f kp(i m(s)), then obtain probable value { u by the prediction of Support Vector Machine cm(s, j), u wt(s, j), u kp(s, j) | j=1,2 ..., 374}, and then calculate concept and closely spend:
u ( s , j ) = u c m ( s , j ) + u w t ( s , j ) + u k p ( s , j ) 3 - - - ( 8 b )
In the embodiment of the present invention, speech recognition technology is utilized to obtain the Word message relevant to video content from the audio signal of described frame of video; Or,
The Word message relevant to video content is obtained from the captions of described frame of video.
Step 205: based on described Word message and concept lexical information, calculate word semantic similarity.
Next, corresponding to video-frequency band captions (subtitle) information processes.Based on the set Γ that captions vocabulary is formed stthe set Γ of (s) and concept vocabulary cpj (), by the similarity measurement instrument WordNet::Similarity of outside dictionary WordNet, calculates word semantic similarity (textualsemanticsimilarity):
κ ( s , j ) = max γ ∈ Γ s t ( s ) 1 | Γ c p ( j ) | Σ ω ∈ Γ c p ( j ) η ( γ , ω ) - - - ( 9 b )
Wherein η (γ, ω) represents captions vocabulary γ and the similarity value of concept vocabulary ω in WordNet::Similarity.
In order to reduce the impact of uncorrelated concept, define following word degree of correlation (textualrelatedness):
ρ ( s , j ) = 1 Q κ ( s , j ) , u ( s , j ) ∈ ( 0.5 , 1 ] 0 , u ( s , j ) ∈ [ 0 , 0.5 ] - - - ( 10 b )
Wherein Q ensures the normalization coefficient set up.What provide due to Support Vector Machine is the probability of two class classification problems, naturally adopts threshold value 0.5 above in formula.
Step 206: closely spend based on described word semantic similarity and described concept, calculates described semantic indicative character.
With reference to Fig. 4, in order to excavate semantic information, this programme extracts the semantic indicative character of video-frequency band based on 374 concepts of VIREO-374 and three kinds of Support Vector Machine (SupportVectorMachine is called for short SVM) of each concept.Support Vector Machine is trained based on the colour moment introduced above, wavelet texture and local key point feature, can estimate the probable value of the degree in close relations between a given frame of video and concept in prediction.Calculate the flow process of the semantic indicative character of video-frequency band as shown in Figure 4:
For video-frequency band χ s, first extract its intermediate frame i mthe colour moment feature f of (s) cm(i m(s)), Wavelet Texture f wt(i m(s)) and local key point feature f kp(i m(s)), then obtain probable value { u by the prediction of Support Vector Machine cm(s, j), u wt(s, j), u kp(s, j) | j=1,2 ..., 374}, and then calculate concept and closely spend:
u ( s , j ) = u c m ( s , j ) + u w t ( s , j ) + u k p ( s , j ) 3 - - - ( 8 b )
Next, corresponding to video-frequency band caption information processes.Based on the set Γ that captions vocabulary is formed stthe set Γ of (s) and concept vocabulary cpj (), by the similarity measurement instrument WordNet::Similarity of outside dictionary WordNet, calculates word semantic similarity:
κ ( s , j ) = max γ ∈ Γ s t ( s ) 1 | Γ c p ( j ) | Σ ω ∈ Γ c p ( j ) η ( γ , ω ) - - - ( 9 b )
Wherein η (γ, ω) represents captions vocabulary γ and the similarity value of concept vocabulary ω in WordNet::Similarity.
In order to reduce the impact of uncorrelated concept, define following word degree of correlation:
ρ ( s , j ) = 1 Q κ ( s , j ) , u ( s , j ) ∈ ( 0.5 , 1 ] 0 , u ( s , j ) ∈ [ 0 , 0.5 ] - - - ( 10 b )
Wherein Q ensures the normalization coefficient set up.What provide due to Support Vector Machine is the probability of two class classification problems, naturally adopts threshold value 0.5 above in formula.
Finally, the semantic indicative character f of video-frequency band es () is defined as the weighted sum that ρ (s, j) is weight with u (s, j):
f E ( s ) = Σ j = 1 374 ρ ( s , j ) u ( s , j ) - - - ( 11 b )
Step 207: according to feature weight value, linear superposition is carried out to each feature that second feature is concentrated, obtain the conspicuousness score value of video-frequency band.
Finally, utilize the linear model of the heavy weighting of an iteration to merge three kinds of advanced features, produce the video frequency abstract of user's Len req.
In the embodiment of the present invention, video frequency abstract finally determines by the conspicuousness score value of each video-frequency band, and thus adopt following linear model to merge three kinds of advanced features, fusion results is the conspicuousness score value of video-frequency band:
f SAL(s)=w M(s)f M(s)+w F(s)f F(s)+w E(s)f E(s)(12b)
Wherein w m(s), w f(s) and w es () is the weight of feature.Before linear fusion, each feature is normalized to interval [0,1] all respectively.
Method below by the heavy weighting of a kind of iteration calculates feature weight.In kth time iteration, weight w #(s) (# ∈ M, F, E}) by following macroscopical factor-alpha #(s) and microcosmic factor-beta #product (the i.e. w of (s) #(s)=α #(s) β #(s)) determine:
α # ( s ) = 1 - r # ( s ) N S - - - ( 13 b )
β # ( k ) ( s ) = 1 + f # ( s ( k ) ) - f # ( s ′ ( k - 1 ) ) f # ( s ( k ) ) + f # ( s ′ ( k - 1 ) ) - - - ( 14 b )
Wherein r #s () is feature f #s () is at { f #(s) | s=1,2 ..., N srank after descending, N sit is the sum of video-frequency band in video.Next, the conspicuousness f of video-frequency band can be calculated sAL(s) by its sequence descending.According to user's Len req, can according to f sALs () is from high to low by video-frequency band selected video frequency abstract one by one.
Before iterative process starts first, according to the principle of equal weight, initialization is carried out to feature weight.Iterative process terminates through 15 times.
The technical scheme of the embodiment of the present invention, first extracts the low-level features such as colour moment, wavelet texture, motion and local key point from frame of video.Next, based on these low-level features, calculate senior vision and semantic feature further, comprise motion attention feature, consider the semantic indicative character of the face attentiveness characteristic sum video-frequency band of depth information.Then, utilize the linear model of the heavy weighting of an iteration to merge three kinds of advanced features, produce the video frequency abstract of user's Len req.
Fig. 5 is the structure composition schematic diagram of the electronic equipment of the embodiment of the present invention one, and as shown in Figure 5, described electronic equipment comprises:
Extraction unit 51, for extracting fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
First processing unit 52, for based on described fisrt feature collection, calculates second feature collection, and described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Second processing unit 53, for utilizing the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtains video frequency abstract.
It will be appreciated by those skilled in the art that the practical function of each unit in the electronic equipment shown in Fig. 5 can refer to the associated description of aforementioned video processing method and understands.The function of each unit in the electronic equipment shown in Fig. 5 realizes by the program run on processor, also realizes by concrete logical circuit.
Fig. 6 is the structure composition schematic diagram of the electronic equipment of the embodiment of the present invention two, and as shown in Figure 6, described electronic equipment comprises:
Extraction unit 61, for extracting fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
First processing unit 62, for based on described fisrt feature collection, calculates second feature collection, and described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Second processing unit 63, for utilizing the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtains video frequency abstract.
Described first processing unit 62 comprises:
Motion attention feature subelement 621, for the motion feature concentrated according to described fisrt feature, calculates motion attention feature;
Face attentiveness feature subelement 622, for being obtained area and the position of face in each frame of video by Face datection algorithm, based on the depth image corresponding with this frame of video and the pixel set forming face, calculate the face attentiveness feature based on depth information.
Described electronic equipment also comprises:
Training unit 64, for based on described colour moment feature, Wavelet Texture and local key point feature, trains Support Vector Machine.
Described electronic equipment also comprises:
Word Input unit 65, obtains the Word message relevant to video content for utilizing speech recognition technology from the audio signal of described frame of video; Or, from the captions of described frame of video, obtain the Word message relevant to video content.
Described first processing unit 62 comprises:
Semantic indicative character subelement 623, for utilizing described Support Vector Machine to carry out the detection of semantic concept to described colour moment feature, Wavelet Texture, locally key point feature, obtaining concept and closely spending; Based on described Word message and concept lexical information, calculate word semantic similarity; Closely spend based on described word semantic similarity and described concept, calculate described semantic indicative character.
Described second processing unit 63 comprises:
Linear superposition subelement 631, for carrying out linear superposition according to feature weight value to each feature that second feature is concentrated, obtains the conspicuousness score value of video-frequency band;
Video frequency abstract subelement 632, for according to preset length of summarization, the conspicuousness score value order from high to low according to video-frequency band elects video-frequency band as video frequency abstract one by one.
It will be appreciated by those skilled in the art that the practical function of each unit in the electronic equipment shown in Fig. 6 can refer to the associated description of aforementioned video processing method and understands.The function of each unit in the electronic equipment shown in Fig. 6 realizes by the program run on processor, also realizes by concrete logical circuit.
Between technical scheme described in the embodiment of the present invention, when not conflicting, can combination in any.
In several embodiment provided by the present invention, should be understood that, disclosed method and smart machine, can realize by another way.Apparatus embodiments described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, and as: multiple unit or assembly can be in conjunction with, maybe can be integrated into another system, or some features can be ignored, or do not perform.In addition, the coupling each other of shown or discussed each part or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of equipment or unit or communication connection can be electrical, machinery or other form.
The above-mentioned unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, also can be distributed in multiple network element; Part or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in various embodiments of the present invention can all be integrated in second processing unit, also can be each unit individually as a unit, also can two or more unit in a unit integrated; Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.

Claims (10)

1. a method for processing video frequency, described method comprises:
From frame of video, extract fisrt feature collection, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
Based on described fisrt feature collection, calculate second feature collection, described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Utilize the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtain video frequency abstract.
2. method for processing video frequency according to claim 1, described based on described fisrt feature collection, calculate second feature collection, comprising:
According to the motion feature that described fisrt feature is concentrated, calculate motion attention feature;
Obtained area and the position of face in each frame of video by Face datection algorithm, based on the depth image corresponding with this frame of video and the pixel set forming face, calculate the face attentiveness feature based on depth information.
3. method for processing video frequency according to claim 1, described method also comprises:
Speech recognition technology is utilized to obtain the Word message relevant to video content from the audio signal of described frame of video; Or,
The Word message relevant to video content is obtained from the captions of described frame of video.
4. method for processing video frequency according to claim 3, described method also comprises:
Based on described colour moment feature, Wavelet Texture and local key point feature, training Support Vector Machine;
Described based on described fisrt feature collection, calculate second feature collection, comprising:
Described Support Vector Machine carries out the detection of semantic concept to described colour moment feature, Wavelet Texture, locally key point feature, obtains concept and closely spends;
Based on described Word message and concept lexical information, calculate word semantic similarity;
Closely spend based on described word semantic similarity and described concept, calculate described semantic indicative character.
5. method for processing video frequency according to claim 1, the described linear model of the heavy weighting of iteration that utilizes carries out fusion treatment to each feature that second feature is concentrated, thus obtains video frequency abstract; Comprise:
According to feature weight value, linear superposition is carried out to each feature that second feature is concentrated, obtain the conspicuousness score value of video-frequency band;
According to the length of summarization preset, the conspicuousness score value order from high to low according to video-frequency band elects video-frequency band as video frequency abstract one by one.
6. an electronic equipment, described electronic equipment comprises:
Extraction unit, for extracting fisrt feature collection from frame of video, described fisrt feature collection comprises: colour moment feature, Wavelet Texture, motion feature, local key point feature;
First processing unit, for based on described fisrt feature collection, calculates second feature collection, and described second feature collection comprises: motion attention feature, based on the face attentiveness feature of depth information, the semantic indicative character of video-frequency band;
Second processing unit, for utilizing the linear model of the heavy weighting of iteration to carry out fusion treatment to each feature that second feature is concentrated, thus obtains video frequency abstract.
7. electronic equipment according to claim 6, described first processing unit comprises:
Motion attention feature subelement, for the motion feature concentrated according to described fisrt feature, calculates motion attention feature;
Face attentiveness feature subelement, for being obtained area and the position of face in each frame of video by Face datection algorithm, based on the depth image corresponding with this frame of video and the pixel set forming face, calculate the face attentiveness feature based on depth information.
8. electronic equipment according to claim 6, described electronic equipment also comprises:
Word Input unit, obtains the Word message relevant to video content for utilizing speech recognition technology from the audio signal of described frame of video; Or, from the captions of described frame of video, obtain the Word message relevant to video content.
9. electronic equipment according to claim 6, described electronic equipment also comprises:
Training unit, for based on described colour moment feature, Wavelet Texture and local key point feature, trains Support Vector Machine;
Described first processing unit comprises:
Semantic indicative character subelement, for utilizing described Support Vector Machine to carry out the detection of semantic concept to described colour moment feature, Wavelet Texture, locally key point feature, obtaining concept and closely spending; Based on described Word message and concept lexical information, calculate word semantic similarity; Closely spend based on described word semantic similarity and described concept, calculate described semantic indicative character.
10. electronic equipment according to claim 9, described second processing unit comprises:
Linear superposition subelement, for carrying out linear superposition according to feature weight value to each feature that second feature is concentrated, obtains the conspicuousness score value of video-frequency band;
Video frequency abstract subelement, for according to preset length of summarization, the conspicuousness score value order from high to low according to video-frequency band elects video-frequency band as video frequency abstract one by one.
CN201510535580.9A 2015-08-27 2015-08-27 A kind of method for processing video frequency and electronic equipment Active CN105228033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510535580.9A CN105228033B (en) 2015-08-27 2015-08-27 A kind of method for processing video frequency and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510535580.9A CN105228033B (en) 2015-08-27 2015-08-27 A kind of method for processing video frequency and electronic equipment

Publications (2)

Publication Number Publication Date
CN105228033A true CN105228033A (en) 2016-01-06
CN105228033B CN105228033B (en) 2018-11-09

Family

ID=54996666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510535580.9A Active CN105228033B (en) 2015-08-27 2015-08-27 A kind of method for processing video frequency and electronic equipment

Country Status (1)

Country Link
CN (1) CN105228033B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355171A (en) * 2016-11-24 2017-01-25 深圳凯达通光电科技有限公司 Video monitoring internetworking system
CN106934397A (en) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 Image processing method, device and electronic equipment
CN107222795A (en) * 2017-06-23 2017-09-29 南京理工大学 A kind of video abstraction generating method of multiple features fusion
CN107979764A (en) * 2017-12-06 2018-05-01 中国石油大学(华东) Video caption generation method based on semantic segmentation and multilayer notice frame
CN109413510A (en) * 2018-10-19 2019-03-01 深圳市商汤科技有限公司 Video abstraction generating method and device, electronic equipment, computer storage medium
CN109565614A (en) * 2016-06-28 2019-04-02 英特尔公司 Multiple streams are adjusted
CN109932617A (en) * 2019-04-11 2019-06-25 东南大学 A kind of adaptive electric network failure diagnosis method based on deep learning
CN110225368A (en) * 2019-06-27 2019-09-10 腾讯科技(深圳)有限公司 A kind of video locating method, device and electronic equipment
CN110347870A (en) * 2019-06-19 2019-10-18 西安理工大学 The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method
WO2020119187A1 (en) * 2018-12-14 2020-06-18 北京沃东天骏信息技术有限公司 Method and device for segmenting video
CN111984820A (en) * 2019-12-19 2020-11-24 重庆大学 Video abstraction method based on double-self-attention capsule network
CN113158720A (en) * 2020-12-15 2021-07-23 嘉兴学院 Video abstraction method and device based on dual-mode feature and attention mechanism

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1685344A (en) * 2002-11-01 2005-10-19 三菱电机株式会社 Method for summarizing unknown content of video
US20050249412A1 (en) * 2004-05-07 2005-11-10 Regunathan Radhakrishnan Multimedia event detection and summarization
WO2007099496A1 (en) * 2006-03-03 2007-09-07 Koninklijke Philips Electronics N.V. Method and device for automatic generation of summary of a plurality of images
CN101743596A (en) * 2007-06-15 2010-06-16 皇家飞利浦电子股份有限公司 Method and apparatus for automatically generating summaries of a multimedia file
US20120099793A1 (en) * 2010-10-20 2012-04-26 Mrityunjay Kumar Video summarization using sparse basis function combination
CN102880866A (en) * 2012-09-29 2013-01-16 宁波大学 Method for extracting face features
KR20130061058A (en) * 2011-11-30 2013-06-10 고려대학교 산학협력단 Video summary method and system using visual features in the video
CN103200463A (en) * 2013-03-27 2013-07-10 天脉聚源(北京)传媒科技有限公司 Method and device for generating video summary
CN103210651A (en) * 2010-11-15 2013-07-17 华为技术有限公司 Method and system for video summarization
CN104508682A (en) * 2012-08-03 2015-04-08 柯达阿拉里斯股份有限公司 Identifying key frames using group sparsity analysis

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1685344A (en) * 2002-11-01 2005-10-19 三菱电机株式会社 Method for summarizing unknown content of video
US20050249412A1 (en) * 2004-05-07 2005-11-10 Regunathan Radhakrishnan Multimedia event detection and summarization
WO2007099496A1 (en) * 2006-03-03 2007-09-07 Koninklijke Philips Electronics N.V. Method and device for automatic generation of summary of a plurality of images
CN101743596A (en) * 2007-06-15 2010-06-16 皇家飞利浦电子股份有限公司 Method and apparatus for automatically generating summaries of a multimedia file
US20120099793A1 (en) * 2010-10-20 2012-04-26 Mrityunjay Kumar Video summarization using sparse basis function combination
CN103210651A (en) * 2010-11-15 2013-07-17 华为技术有限公司 Method and system for video summarization
KR20130061058A (en) * 2011-11-30 2013-06-10 고려대학교 산학협력단 Video summary method and system using visual features in the video
CN104508682A (en) * 2012-08-03 2015-04-08 柯达阿拉里斯股份有限公司 Identifying key frames using group sparsity analysis
CN102880866A (en) * 2012-09-29 2013-01-16 宁波大学 Method for extracting face features
CN103200463A (en) * 2013-03-27 2013-07-10 天脉聚源(北京)传媒科技有限公司 Method and device for generating video summary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NAVEED EJAZ ET.AL: "Multi-scale information maximization based visual attention modeling for video summarization", 《2012 6TH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPLLICATIONS, SERVICE AND TECHNOLOGIES》 *
YU KONG ET.AL: "Hierarchical 3D kernel descriptors for action recognition using depth sequences", 《2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109565614B (en) * 2016-06-28 2021-08-20 英特尔公司 Multiple flow regulation
CN109565614A (en) * 2016-06-28 2019-04-02 英特尔公司 Multiple streams are adjusted
CN106355171A (en) * 2016-11-24 2017-01-25 深圳凯达通光电科技有限公司 Video monitoring internetworking system
CN106934397A (en) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 Image processing method, device and electronic equipment
WO2018166438A1 (en) * 2017-03-13 2018-09-20 北京市商汤科技开发有限公司 Image processing method and device and electronic device
US10943145B2 (en) 2017-03-13 2021-03-09 Beijing Sensetime Technology Development Co., Ltd. Image processing methods and apparatus, and electronic devices
CN106934397B (en) * 2017-03-13 2020-09-01 北京市商汤科技开发有限公司 Image processing method and device and electronic equipment
CN107222795A (en) * 2017-06-23 2017-09-29 南京理工大学 A kind of video abstraction generating method of multiple features fusion
CN107979764B (en) * 2017-12-06 2020-03-31 中国石油大学(华东) Video subtitle generating method based on semantic segmentation and multi-layer attention framework
CN107979764A (en) * 2017-12-06 2018-05-01 中国石油大学(华东) Video caption generation method based on semantic segmentation and multilayer notice frame
CN109413510A (en) * 2018-10-19 2019-03-01 深圳市商汤科技有限公司 Video abstraction generating method and device, electronic equipment, computer storage medium
JP2021503123A (en) * 2018-10-19 2021-02-04 深▲せん▼市商▲湯▼科技有限公司Shenzhen Sensetime Technology Co., Ltd. Video summary generation methods and devices, electronic devices and computer storage media
CN109413510B (en) * 2018-10-19 2021-05-18 深圳市商汤科技有限公司 Video abstract generation method and device, electronic equipment and computer storage medium
JP7150840B2 (en) 2018-10-19 2022-10-11 深▲セン▼市商▲湯▼科技有限公司 Video summary generation method and apparatus, electronic equipment and computer storage medium
WO2020119187A1 (en) * 2018-12-14 2020-06-18 北京沃东天骏信息技术有限公司 Method and device for segmenting video
US11275950B2 (en) 2018-12-14 2022-03-15 Beijing Wodong Tianjun Information Technology Co., Ltd. Method and apparatus for segmenting video
CN109932617A (en) * 2019-04-11 2019-06-25 东南大学 A kind of adaptive electric network failure diagnosis method based on deep learning
CN110347870A (en) * 2019-06-19 2019-10-18 西安理工大学 The video frequency abstract generation method of view-based access control model conspicuousness detection and hierarchical clustering method
CN110225368A (en) * 2019-06-27 2019-09-10 腾讯科技(深圳)有限公司 A kind of video locating method, device and electronic equipment
CN111984820A (en) * 2019-12-19 2020-11-24 重庆大学 Video abstraction method based on double-self-attention capsule network
CN111984820B (en) * 2019-12-19 2023-10-27 重庆大学 Video abstraction method based on double self-attention capsule network
CN113158720A (en) * 2020-12-15 2021-07-23 嘉兴学院 Video abstraction method and device based on dual-mode feature and attention mechanism

Also Published As

Publication number Publication date
CN105228033B (en) 2018-11-09

Similar Documents

Publication Publication Date Title
CN105228033A (en) A kind of method for processing video frequency and electronic equipment
Ihianle et al. A deep learning approach for human activities recognition from multimodal sensing devices
Sun et al. Lattice long short-term memory for human action recognition
Luo et al. Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks
Wang et al. Dense trajectories and motion boundary descriptors for action recognition
US20230030419A1 (en) Machine Learning Model Training Method and Device and Electronic Equipment
CN110147699B (en) Image recognition method and device and related equipment
Arivazhagan et al. Human action recognition from RGB-D data using complete local binary pattern
Chanti et al. Improving bag-of-visual-words towards effective facial expressive image classification
CN113963445A (en) Pedestrian falling action recognition method and device based on attitude estimation
CN111539290A (en) Video motion recognition method and device, electronic equipment and storage medium
CN109684969A (en) Stare location estimation method, computer equipment and storage medium
Zhang et al. Retargeting semantically-rich photos
Abebe et al. A long short-term memory convolutional neural network for first-person vision activity recognition
Fan et al. Expectation propagation learning of a Dirichlet process mixture of Beta-Liouville distributions for proportional data clustering
Abebe et al. Inertial-vision: cross-domain knowledge transfer for wearable sensors
CN113569607A (en) Motion recognition method, motion recognition device, motion recognition equipment and storage medium
Pang et al. Analysis of computer vision applied in martial arts
CN109934852B (en) Video description method based on object attribute relation graph
Sun et al. Learning spatio-temporal co-occurrence correlograms for efficient human action classification
Yan et al. Unsupervised video categorization based on multivariate information bottleneck method
Fu et al. A novel approach for anomaly event detection in videos based on autoencoders and SE networks
Liu [Retracted] Sports Deep Learning Method Based on Cognitive Human Behavior Recognition
Iosifidis et al. Human action recognition based on bag of features and multi-view neural networks
CN108038451A (en) Anomaly detection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant