CN107516084B - Internet video author identity identification method based on multi-feature fusion - Google Patents

Internet video author identity identification method based on multi-feature fusion Download PDF

Info

Publication number
CN107516084B
CN107516084B CN201710762954.XA CN201710762954A CN107516084B CN 107516084 B CN107516084 B CN 107516084B CN 201710762954 A CN201710762954 A CN 201710762954A CN 107516084 B CN107516084 B CN 107516084B
Authority
CN
China
Prior art keywords
video
image
motion
camera motion
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710762954.XA
Other languages
Chinese (zh)
Other versions
CN107516084A (en
Inventor
郭金林
陈立栋
白亮
老松杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710762954.XA priority Critical patent/CN107516084B/en
Publication of CN107516084A publication Critical patent/CN107516084A/en
Application granted granted Critical
Publication of CN107516084B publication Critical patent/CN107516084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Abstract

The invention discloses an internet video author identity identification method based on multi-feature fusion, which comprises the following steps: inputting a video, and uniformly downsampling a video frame image; extracting a background camera motion vector from any one of the extracted frame images and a pre-frame image thereof, and further calculating camera motion characteristics in the video image; on the basis of video shot segmentation, calculating the average shot length and the sudden change shot ratio to obtain video structural characteristics; by fusing the characteristics, the support vector machine classifier is used for learning and identifying whether the identity of the video producer is a professional video producer or an amateur video producer. On the basis of fully considering the manufacturing characteristics of professional videos and amateur videos, the invention can accurately learn and identify the identity of an internet video producer as a professional video producer or an amateur video producer by fusing the motion characteristics and the structural characteristics of the camera in the internet videos and utilizing the support vector machine classifier.

Description

Internet video author identity identification method based on multi-feature fusion
Technical Field
The invention relates to the technical field of multimedia communication technology and Internet, in particular to an Internet video author identity identification method based on multi-feature fusion.
Background
Video content is typically produced by professional video production companies, such as television stations. The videos are generally recorded by professionals, are watched and consumed by the public after post-processing, and are generally high in video quality. The videos are generally recorded by professional people through professional camera equipment, subjected to post-processing according to a certain rule, and finally pushed to users through a stable channel to be watched.
The popularity of camera devices (e.g., smartphones with cameras) and the declining price of mass storage devices have resulted in a huge amount of personal video content. The development and popularization of internet technologies are remodeling video consumption modes, especially the rise and popularity of video sharing websites such as YouTube and YouKu, so that internet users can conveniently upload, manage and share videos. In summary, users on the internet are not only consumers of videos, but rather participants and producers of videos.
Hundreds of universal users on the internet are both publishers and consumers of internet videos, which has led to an explosive increase in the number of internet videos. Recent statistics show that on the famous video sharing website YouTube, a user uploads up to 120 hours per minute of video, resulting in a video upload time period of more than one year per day. Videos uploaded by users can be generally classified as professional videos (PPV) or amateur videos (APV) according to the original producer identity of the videos (the uploader of the videos is not necessarily the producer of the videos).
Amateur videos are recorded by amateur personnel who do not have the great experience of making videos using personal camera equipment (such as a mobile phone), undergo little post-making, and are uploaded to a network video by users. In contrast, professional videos are recorded by professional personnel through professional camera equipment and edited according to certain rules, such as news videos and sports videos. It should be noted that many network videos are still considered professional videos by extracting partial segments from professional videos such as television program videos and movies and uploading the partial segments to the network (and some videos are added with subtitles, background music and the like).
Comparing professional videos with amateur videos, the following differences are made:
(1) the number of amateur videos on the internet is increasing at an explosive rate. As no fine post-production is needed, anyone can easily record amateur videos by using a camera or a mobile phone and upload the amateur videos to an internet video sharing website, a large amount of business videos on the internet appear;
(2) because the shooting environment is poor, the shooting equipment is not fine enough, the quality of amateur videos is generally poorer than that of professional videos, such as more irregular camera motion, fuzzy background and the like;
(3) compared with professional videos such as news videos, amateur videos are poor in structure. Like news video can be physically divided into shots, scenes, stories; sports video (in the example of tennis) may be physically divided into discs, offices, etc. The structure of the amateur video is generally not clear enough due to lack of elaboration in the later period;
(4) compared with professional videos, the types of audio accompanied by the amateur videos are more various. The accompanying audio type in professional video is single, such as news video, the audio type is mainly commentary of announcers, and the audio in sports video mainly comprises voice and live cheering of commentaries of commentators.
Disclosure of Invention
With the development of internet technology and the popularization of camera equipment, video content can not be produced only by professional video production companies (such as television stations), and any amateur can also record and release video quickly. The number of professional production videos (abbreviated professional videos) and amateur production videos (abbreviated amateur videos) on the internet is increasing at an explosive rate. In view of the above, the present invention provides an internet video author identity recognition method based on multi-feature fusion, which fuses background camera motion features, video structure information features and accompanying audio features in a video, and learns and recognizes whether a video producer identity is a professional video producer or an amateur video producer by using a support vector machine classifier.
Therefore, the invention adopts the following technical scheme:
an Internet video author identity identification method based on multi-feature fusion comprises the following steps:
s1 extracting camera motion features in video images
S11, the input video image frame is evenly down-sampled, and the whole input video is evenly down-sampled and extracted to obtain M image frames.
S12 extracts background camera motion vectors between adjacent image frames for the M image frames extracted in S11.
S13, calculating a camera motion acceleration set and a motion direction change angle set in the video image according to the background camera motion vector between any three continuous image frames to obtain a camera motion vector set between any two continuous image frames in the whole video
Figure GDA0002278561090000031
And calculating a camera motion acceleration set A and a motion direction change angle set theta by using the motion vector between any three continuous image frames.
S14 clustering camera motion vectors according to video
Figure GDA0002278561090000032
Respectively calculating a camera motion vector set by the acceleration set A and the motion direction change angle set theta
Figure GDA0002278561090000033
And taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera.
S15 fuses the camera motion feature statistics.
All the above statistical features of the motion vector (i.e., the motion vector set calculated in S14) are added
Figure GDA0002278561090000034
The mean, the second-order center distance, the third-order center distance, and the fourth-order center distance) are spliced together to form an 8-dimensional motion vector feature description. All the above statistical characteristics of the camera motion acceleration (i.e. the camera calculated in S14)The mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion acceleration set) are spliced together, and an 8-dimensional camera motion acceleration characteristic description is also formed. All the statistical characteristics of the motion direction change angles (the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion direction change angle set calculated in the step S14) are spliced together front and back to form a 4-dimensional camera motion direction change characteristic description.
Finally, 8-dimensional motion vector feature description, 8-dimensional camera motion acceleration feature description and 4-dimensional camera motion direction change feature description are horizontally spliced, and a 20-dimensional vector can be obtained for describing camera motion features in the video.
S2 extracts video structure information features.
And S3 video feature fusion.
And (4) splicing the camera motion characteristics in the video image obtained in the step (S1) and the video structure information characteristics extracted in the step (S2) to obtain a 24-dimensional vector for describing video characteristics, wherein the characteristics are input into a classifier to be used for identifying whether the video producer identity is a professional video producer or an amateur video producer.
S4 identifies the video producer identity using a classifier.
In the invention, a Support Vector Machine (SVM) is used as a classifier, and a Gaussian radial basis kernel function is used as a kernel function of the SVM, so that professional videos and amateur videos are classified and identified.
The support vector machine is a new and very potential classification technology proposed in 1995 by AT & T Bell laboratory research group led by Vapnik, and the SVM is a pattern recognition method based on a statistical learning theory. In solving the problems of small samples, nonlinearity and high-dimensional pattern recognition, the SVM has a plurality of specific advantages and can be popularized and applied to other machine learning problems such as classification recognition. After more than thirty years of intensive research, the support vector machine is very mature.
In S11 of the present invention, the implementation manner of performing uniform downsampling on an input video image frame is as follows: 5 image frames are uniformly extracted from each second of video image.
The background camera motion vector between adjacent image frames in the M image frames extracted at S11 is calculated using a block matching algorithm at S12 of the present invention. Specifically, the implementation manner is as follows:
s121, uniformly dividing a current image frame K and a preorder image frame K-1 thereof into a plurality of image blocks with the same size; let the size of the image block be S × S, S being 10 pixels in size.
S122, selecting any image block in the current image frame K as a test image block, wherein S is the size of the test image block so as to test the central point (x) of the image blockt,yt) A 3S by 3S neighbor search region is selected in its preceding image frame K-1 for the center.
For the test image block at the edge of the current image frame K, a square block with the size of 3S × 3S closest to the corresponding position of the test image block is selected as an adjacent search region in the preceding image frame K-1, and the selection priority of the region in the horizontal direction can be set.
S123: in the adjacent search area, 5 pixels are used as sliding steps to construct a sliding image block with the size of S x S, and an image area similar to the test image block is searched in the sliding image block by calculating the index of the maximum matching pixel number; the maximum matching pixel number index calculation mode is as follows:
Figure GDA0002278561090000042
wherein (x)t,yt) Is the center position of the test image block, (x)p,yp) Is the center position, P, of the sliding image block (i.e., the searched image block)cIs to test the color brightness value, Q, of the pixel points in the image blockcIs the color brightness value of the pixel point in the sliding image block (also called searched image block); (x, y) is the coordinates of the lower left corner of the test image block; testing the central position of an image block and a sliding image blockThe offset of the center position is: dx=xp-xt,dy=yp-ytAnd T is a matching threshold.
S124: for any test image block, its center is (x)t,yt) And similar image blocks thereof centered at (x)p,yp). Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block positionx,vy) The calculation method is as follows:
vx=x'p-xt(3)
vy=y'p-yt(4)
wherein v isxAnd vyMotion components of the v motion vector in the X and Y directions, respectively;
preferably, when detecting similar image blocks, the motion vectors v of the detected test image blocks may be unreliable due to the presence of many areas of consistent texture in the image frame. To this end, the present invention checks the number of similar image blocks detected in the adjacent search area. If the number of similar image blocks detected in the adjacent search area exceeds the number of D blocks, and D is 4, it means that the adjacent search area is a consistent texture area, and the detected motion vector v has low reliability, the motion vector of the test image block is set to v 0.
S125: steps S122-S124 are repeated until motion vectors v of all test image blocks in the current image frame K are detected.
S126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K; distributing motion vectors of all tested image blocks in the current image frame K into eight directional intervals, namely [0, pi/4 ], [ pi/4, [ pi/2 ], [ pi/2, 3 pi/4 ], [3 pi/4, [ pi ], [ pi, 5 pi/4 ], [5 pi/4, [3 pi/2 ], [3 pi/2, 7 pi/4 ], [7 pi/4, 2 pi) ] according to the directions of the motion vectors; the test image block in S125 whose motion vector direction falls in the direction interval represented by the largest histogram bin is selected as the background area B.
S127: calculating the background camera motion vector between the current image frame K and the preorder image frame K-1 in the following way:
Figure GDA0002278561090000051
wherein N isBIs the number, v, of image blocks in the background area BtIs the motion vector of the image block t in the background area.
In the invention, in S13, the motion acceleration and the motion direction change angle of the camera are calculated according to the motion vector of the background camera between three continuous image frames, and the calculation method is as follows:
Figure GDA0002278561090000061
Figure GDA0002278561090000062
wherein, Δ t is the time interval between two consecutive image frames, since the present invention extracts image frames uniformly (5 frames of images per second), Δ t is a constant; change of acceleration
Figure GDA0002278561090000063
Equal to the magnitude of the difference in background camera motion vectors between two successive image frames.
Through the calculation, a camera motion vector set between any two continuous image frames in the whole video can be obtained
Figure GDA0002278561090000064
And calculating the motion acceleration set A of the camera by the motion vector between any three continuous image frames1,2,3,a2,3,4,...,ak-1,k,k+1,...,aM-2,M-1,M,The set of angles theta with the direction of motion is ═ theta1,2,32,3,4,...,θk-1,k,k+1,...,θM-2,M-1,M,}。
In the invention S14, camera motion vector set is based on the video
Figure GDA0002278561090000065
Respectively calculating a camera motion vector set by the acceleration set A and the motion direction change angle set theta
Figure GDA0002278561090000066
And taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera.
For motion vector setThe mean value is calculated as follows:
Figure GDA0002278561090000068
for motion vector set
Figure GDA0002278561090000069
The t-order center distance is calculated as follows:
where t ∈ {2,3,4 }.
For the acceleration set a and the motion direction change angle set Θ, the above statistical information is also calculated, and the mean value of the acceleration set a and the mean value of the direction change angle set Θ are calculated as follows:
Figure GDA0002278561090000071
Figure GDA0002278561090000072
for the acceleration set A and the motion direction change angle set theta, the t-order center distance is calculated as follows:
Figure GDA0002278561090000073
Figure GDA0002278561090000074
the invention relates to a method for extracting video structure information characteristics by S2, which comprises the following steps:
extracting shot number N by adopting video shot detection algorithmshot(ii) a Then, on the basis of video shot detection, the following structural information features are extracted: average lens lengthAbrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shot
Wherein the average lens length
Figure GDA0002278561090000076
The calculation is as follows:
Figure GDA0002278561090000077
l is the total length of the video image.
Abrupt lens ratio Rcut_shotThe calculation is as follows:
Ncut_shotthe number of shots is abruptly changed.
The gradient lens ratio is calculated as:
Rgrandual_shot=1-Rcut_shot(16)
extracting all the above structure information (i.e. shot number N) from the video imageshotAverage lens length
Figure GDA0002278561090000081
Abrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shot) Are horizontally spliced together to formA 4-dimensional video structure information characterization is shown.
On the basis of fully considering the manufacturing characteristics of professional videos and amateur videos, the invention can accurately learn and identify the identity of an internet video producer as a professional video producer or an amateur video producer by fusing the motion characteristics and the structural characteristics of the camera in the internet videos and utilizing the support vector machine classifier.
Experiments prove that the internet video author identity identification method based on multi-feature fusion can effectively distinguish the identity of a video producer as a professional video producer or an amateur video producer. In particular, when the professional videos are news, sports videos, television advertisement videos, and music videos, and the amateur videos are self-made videos of internet users, the implementation provided by the present invention can accurately distinguish whether the video producer identity is a professional video producer or an amateur video producer.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the present invention for extracting camera motion characteristics in video images
FIG. 3 is a schematic diagram of an image block search and matching method according to the present invention;
FIG. 4 is a diagram illustrating the distribution of motion vector directions of image blocks in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of the internet video author identification method based on multi-feature fusion according to the present invention includes the following steps:
s1, extracting camera motion characteristics in the video image; fig. 2 is a flowchart of the present invention for extracting the camera motion feature in the video image. In this embodiment, a block matching algorithm is used to calculate a background camera motion vector between two extracted adjacent frames of images: firstly, a frame of image is divided into a plurality of image blocks with the same size, and then the position of each data block of the current frame in the previous frame is searched. Fig. 3 illustrates a schematic diagram of the image block search matching method of the present invention.
In the embodiment, an image frame after down-sampling is divided into a plurality of image blocks with the size of 10 × 10 pixels, and the image blocks are used as a unit to calculate the motion vector of the background camera, so that the matching efficiency is improved and a certain precision can be ensured.
S11, the input video image frame is evenly down-sampled, and the whole input video is evenly down-sampled and extracted to obtain M image frames.
The input video image is uniformly downsampled, and the number of image frames needing to be processed can be reduced through downsampling of the video image frames, so that the calculated amount is effectively reduced. The specific implementation mode is that 5 frames of images are uniformly extracted from each second of video, and M frames of image frames are obtained by extracting the whole input video.
S12 extracts background camera motion vectors between adjacent image frames for the M image frames extracted in S11.
S121, uniformly dividing a current image frame K and a preorder image frame K-1 thereof into a plurality of image blocks with the same size; let the size of the image block be S × S, S being 10 pixels in size.
S122, selecting any image block in the current image frame K as a test image block, wherein S is the size of the test image block so as to test the central point (x) of the image blockt,yt) A 3S by 3S neighbor search region is selected in its preceding image frame K-1 for the center.
For the test image block at the edge of the current image frame K, a square block with the size of 3S × 3S closest to the corresponding position of the test image block is selected as an adjacent search region in the preceding image frame K-1, and the selection priority of the region in the horizontal direction can be set.
S123: and in the adjacent search area, constructing a sliding image block with the size of S x S by taking 5 pixels as a sliding step, and searching an image area similar to the test image block in the sliding image block by calculating the index of the maximum matched pixel number. The maximum matching pixel number index calculation mode is as follows:
Figure GDA0002278561090000091
Figure GDA0002278561090000092
wherein (x)t,yt) Is the center position of the test image block, (x)p,yp) Is the center position, P, of the sliding image block (i.e., the searched image block)cIs to test the color brightness value, Q, of the pixel points in the image blockcIs the color brightness value of the pixel point in the sliding image block (also called searched image block); (x, y) is the coordinates of the lower left corner of the test image block; the offset between the center position of the test image block and the center position of the sliding image block is as follows: dx=xp-xt,dy=yp-ytAnd T is a matching threshold.
S124: motion vectors for the test image blocks are calculated.
For any test image block, its center is (x)t,yt) And similar image blocks thereof centered at (x)p,yp). Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block positionx,vy) The calculation method is as follows:
vx=x'p-xt(3)
vy=y'p-yt(4)
wherein v isxAnd vyMotion components of the v motion vector in the X and Y directions, respectively;
preferably, when detecting similar image blocks, the motion vectors v of the detected test image blocks may be unreliable due to the presence of many areas of consistent texture in the image frame. To this end, the present invention checks the number of similar image blocks detected in the adjacent search area. If the number of similar image blocks detected in the adjacent search area exceeds the number of D blocks, and D is 4, it means that the adjacent search area is a consistent texture area, and the detected motion vector v has low reliability, the motion vector of the test image block is set to v 0.
S125: steps S122-S124 are repeated until motion vectors v of all test image blocks in the current image frame K are detected.
S126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K;
specifically, the motion vectors of all the test image blocks in the current image frame K are allocated to eight directional sections, i.e., [0, pi/4), [ pi/4, pi/2), [ pi/2, 3 pi/4), [3 pi/4, pi), [ pi, 5 pi/4), [5 pi/4, 3 pi/2), [3 pi/2, 7 pi/4), [7 pi/4, 2 pi) according to their directions, see fig. 4. The test image block in S125 whose motion vector direction falls in the direction interval represented by the largest histogram bin is selected as the background area B.
S127: calculating the background camera motion vector between the current image frame K and the preorder image frame K-1 in the following way:
wherein N isBIs the number, v, of image blocks in the background area BtIs the motion vector of the image block t in the background area.
S13 calculating the camera motion acceleration and the motion direction change angle between any three continuous image frames
According to the background camera motion vector between any three continuous image frames, the motion acceleration and the motion direction change angle of the camera in the video image can be calculated, and the calculation method comprises the following steps:
Figure GDA0002278561090000111
Figure GDA0002278561090000112
where Δ t is the time interval between two consecutive image frames, since the present invention extracts image frames uniformly (5 images per second), Δ t is one image frameA constant; change of accelerationEqual to the difference of two consecutive motion vectors.
Through the calculation, a camera motion vector set between any two continuous image frames in the whole video can be obtained
Figure GDA0002278561090000114
And calculating the motion acceleration set A of the camera by the motion vector between any three continuous image frames1,2,3,a2,3,4,...,ak-1,k,k+1,...,aM-2,M-1,M,The set of angles theta with the direction of motion is ═ theta1,2,32,3,4,...,θk-1,k,k+1,...,θM-2,M-1,M,}。
S14, respectively calculating the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the camera motion acceleration set and the motion direction change angle set according to the camera motion acceleration set and the motion direction change angle set in the video image, and taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance as the camera motion characteristics.
Camera motion vector set from video images
Figure GDA0002278561090000115
Respectively calculating a camera motion vector set by the acceleration set A and the motion direction change angle set theta
Figure GDA0002278561090000116
And taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera.
For motion vector setThe mean value is calculated as follows:
Figure GDA0002278561090000118
for motion vector set
Figure GDA0002278561090000119
The t-order center distance is calculated as follows:
Figure GDA0002278561090000121
where t ∈ {2,3,4 }.
For the acceleration set a and the motion direction change angle set Θ, the above statistical information is also calculated, and the mean value of the acceleration set a and the mean value of the direction change angle set Θ are calculated as follows:
Figure GDA0002278561090000122
for the acceleration set A and the motion direction change angle set theta, the t-order center distance is calculated as follows:
Figure GDA0002278561090000124
Figure GDA0002278561090000125
s15 fuses the camera motion feature statistics.
All the above statistical features of the motion vector (i.e., the motion vector set calculated in S14) are added
Figure GDA0002278561090000126
The mean, the second-order center distance, the third-order center distance, and the fourth-order center distance) are spliced together to form an 8-dimensional motion vector feature description. All the above statistical characteristics of the camera motion acceleration (i.e. the mean, the second-order center distance and the third-order center of the camera motion acceleration set calculated in S14)Distance and fourth-order center distance) are spliced together front and back, and an 8-dimensional camera motion acceleration characteristic description is also formed. All the statistical characteristics of the motion direction change angles (the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion direction change angle set calculated in the step S14) are spliced together front and back to form a 4-dimensional camera motion direction change characteristic description.
Finally, 8-dimensional motion vector feature description, 8-dimensional camera motion acceleration feature description and 4-dimensional camera motion direction change feature description are horizontally spliced, and a 20-dimensional vector can be obtained to describe the camera motion features in the video clip.
S2 extracts video structure information features.
Compared with professional videos, amateur videos are poor in structurality. The structure or time sequence information in the professional video is strong, for example, the visual rhythm in the advertisement or music video is fast, and the average duration of the shot is short. Advertising videos often also contain more fade shots. Therefore, the video structure information can effectively distinguish the amateur video from the professional video.
Video shot detection is a very mature technology which has been researched, and the present embodiment can adopt a mature shot detection algorithm, and the shot detection accuracy can reach more than 90%.
In the embodiment, the video shot detection algorithm is adopted to extract the shot number Nshot(ii) a Then, on the basis of video shot detection, the following structural information features are extracted: average lens length
Figure GDA0002278561090000131
Abrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shot
Wherein the average lens length
Figure GDA0002278561090000132
The calculation is as follows:
Figure GDA0002278561090000133
l is the total length of the video image.
Abrupt lens ratio Rcut_shotThe calculation is as follows:
Figure GDA0002278561090000134
Ncut_shotthe number of shots is abruptly changed.
The gradient lens ratio is calculated as:
Rgrandual_shot=1-Rcut_shot(16)
extracting all the above structure information (i.e. shot number N) from the video imageshotAverage lens length
Figure GDA0002278561090000135
Abrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shot) And the front and the back are spliced together to form a 4-dimensional video structure information characteristic description.
And S3 video feature fusion.
And (4) splicing the camera motion characteristics in the video image obtained in the step (S1) and the video structure information characteristics extracted in the step (S2) to obtain a 24-dimensional vector for describing video characteristics, wherein the characteristics are input into a classifier to be used for identifying whether the video producer identity is a professional video producer or an amateur video producer.
The S4 classifier identifies the video producer identity.
In the invention, a Support Vector Machine (SVM) is used as a classifier, and a Gaussian radial basis kernel function is used as a kernel function of the SVM, so that professional videos and amateur videos are classified and identified.
The support vector machine is a new and very potential classification technology proposed in 1995 by AT & T Bell laboratory research group led by Vapnik, and the SVM is a pattern recognition method based on a statistical learning theory. In solving the problems of small samples, nonlinearity and high-dimensional pattern recognition, the SVM has a plurality of specific advantages and can be popularized and applied to other machine learning problems such as classification recognition. After more than thirty years of intensive research, the support vector machine is very mature.
Experiments prove that the embodiment provided by the invention can effectively distinguish whether the identity of a video producer is a professional video producer or an amateur video producer. In particular, when the professional videos are news, sports videos, television advertisement videos, and music videos, and the amateur videos are self-made videos of internet users, the implementation provided by the present invention can accurately distinguish whether the video producer identity is a professional video producer or an amateur video producer.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. An Internet video author identity recognition method based on multi-feature fusion is characterized by comprising the following steps:
s1, extracting camera motion characteristics in the video image;
s11, carrying out uniform down-sampling on the input video image frame, and extracting the whole input video through uniform down-sampling to obtain M image frames;
s12 extracting background camera motion vectors between adjacent image frames for the M image frames extracted in S11;
s13, calculating a camera motion acceleration set and a motion direction change angle set in the video image according to the background camera motion vector between any three continuous image frames to obtain a camera motion vector set between any two continuous image frames in the whole video
Figure FDA0002278561080000014
Calculating a camera motion acceleration set A and a motion direction change angle set theta by using a motion vector between any three continuous image frames;
s14 clustering camera motion vectors according to video
Figure FDA0002278561080000013
Respectively calculating a camera motion vector set by the acceleration set A and the motion direction change angle set theta
Figure FDA0002278561080000012
Taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as camera motion characteristics;
s15 fusing camera motion characteristic statistical information;
the motion vector set calculated in S14 is collected
Figure FDA0002278561080000011
All the camera motion characteristics are spliced together front and back to form 8-dimensional motion vector characteristic description; splicing all the camera motion characteristics of the camera motion acceleration set A obtained by calculation in the S14 front and back together to form an 8-dimensional camera motion acceleration characteristic description; splicing all the camera motion characteristics of the motion direction change angle set theta calculated in the S14 together front and back to form a 4-dimensional camera motion direction change characteristic description;
finally, horizontally splicing the 8-dimensional motion vector feature description, the 8-dimensional camera motion acceleration feature description and the 4-dimensional camera motion direction change feature description to obtain a 20-dimensional vector for describing camera motion features in a video image;
s2 extracting video structure information characteristics;
s3 video feature fusion;
splicing the camera motion characteristics in the video image obtained in the step S1 and the video structure information characteristics extracted in the step S2 to obtain a 24-dimensional vector for describing video characteristics, wherein the characteristics are input into a classifier to be used for identifying whether the video producer identity is a professional video producer or an amateur video producer;
s4 identifies the video producer identity using a classifier.
2. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein in S11, the uniform down-sampling of the input video image frames is implemented by: 5 image frames are uniformly extracted from each second of video image.
3. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein the block matching algorithm is used in S12 to calculate the background camera motion vector between adjacent image frames in the M image frames extracted in S11.
4. The internet video authorship identification method based on multi-feature fusion as claimed in claim 1, wherein the implementation method of S12 is as follows:
s121, uniformly dividing a current image frame K and a preorder image frame K-1 thereof into a plurality of image blocks with the same size; setting the size of the image block as S, wherein S is 10 pixels;
s122, selecting any image block in the current image frame K as a test image block, wherein S is the size of the test image block so as to test the central point (x) of the image blockt,yt) Selecting a near search area with the size of 3S by 3S in a preamble image frame K-1 for the center;
for the test image block at the edge of the current image frame K, selecting a square block with the size of 3S × 3S closest to the corresponding position of the test image block in the front image frame K-1 as an adjacent search area, and setting the selection priority of the area in the horizontal direction;
s123: in the adjacent search area, 5 pixels are used as sliding steps to construct a sliding image block with the size of S x S, and an image area similar to the test image block is searched in the sliding image block by calculating the index of the maximum matching pixel number; the maximum matching pixel number index calculation mode is as follows:
Figure FDA0002278561080000021
wherein (x)t,yt) Is the center position of the test image block, (x)p,yp) Is the center position, P, of the sliding image blockcIs to test the color brightness value, Q, of the pixel points in the image blockcThe color brightness value of a pixel point in the sliding image block; (x, y) is the coordinates of the lower left corner of the test image block; the offset between the center position of the test image block and the center position of the sliding image block is as follows: dx=xp-xt,dy=yp-ytT is a matching threshold;
s124: for any test image block, its center is (x)t,yt) And similar image blocks thereof centered at (x)p,yp) (ii) a Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block positionx,vy) The calculation method is as follows:
vx=x'p-xt(3)
vy=y'p-yt(4)
wherein v isxAnd vyMotion components of the v motion vector in the X and Y directions, respectively;
s125: repeating steps S122-S124 until the motion vectors v of all the test image blocks in the current image frame K are detected;
s126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K; distributing motion vectors of all tested image blocks in the current image frame K into eight directional intervals, namely [0, pi/4 ], [ pi/4, [ pi/2 ], [ pi/2, 3 pi/4 ], [3 pi/4, [ pi ], [ pi, 5 pi/4 ], [5 pi/4, [3 pi/2 ], [3 pi/2, 7 pi/4 ], [7 pi/4, 2 pi) ] according to the directions of the motion vectors; selecting a test image block of which the motion vector direction falls in the direction interval represented by the maximum histogram bar in the S125 as a background area B;
s127: calculating the background camera motion vector between the current image frame K and the preorder image frame K-1 in the following way:
Figure FDA0002278561080000031
wherein N isBIs the number, v, of image blocks in the background area BtIs the motion vector of the image block t in the background area.
5. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 4, wherein the step S124 further includes checking the number of similar image blocks detected in the neighboring search area, and if the number of similar image blocks detected in the neighboring search area exceeds D blocks, where D is 4, it means that the neighboring search area is a consistent texture area, and the detected motion vector v is less reliable, and the motion vector of the test image block is set to v 0.
6. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 4 or 5, wherein in step S13, the camera motion acceleration and the motion direction change angle are calculated according to the background camera motion vector between three consecutive image frames, and the calculation method is as follows:
Figure FDA0002278561080000032
Figure FDA0002278561080000033
wherein Δ t is a time interval between two consecutive extracted image frames, Δ t being a constant; change of acceleration
Figure FDA0002278561080000034
A magnitude equal to a difference in background camera motion vectors between two consecutive image frames;
through the calculation, the whole video can be obtainedSet of camera motion vectors between any two consecutive image frames
Figure FDA0002278561080000035
And calculating the motion acceleration set A of the camera by the motion vector between any three continuous image frames1,2,3,a2,3,4,...,ak-1,k,k+1,...,aM-2,M-1,M,The set of angles theta with the direction of motion is ═ theta1,2,32,3,4,...,θk-1,k,k+1,...,θM-2,M-1,M,}。
7. The Internet video authorship recognition method based on multi-feature fusion as claimed in claim 6, wherein in S14, a set of camera motion vectors is calculated
Figure FDA0002278561080000041
Taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera, wherein the method comprises the following steps:
for motion vector set
Figure FDA0002278561080000042
The mean value is calculated as follows:
Figure FDA0002278561080000043
for motion vector setThe t-order center distance is calculated as follows:
Figure FDA0002278561080000045
wherein t ∈ {2,3,4 };
for the acceleration set a and the motion direction change angle set Θ, the mean of the acceleration set a and the mean of the direction change angle set Θ, the calculation is as follows:
Figure FDA0002278561080000046
Figure FDA0002278561080000047
for the acceleration set A and the motion direction change angle set theta, the t-order center distance is calculated as follows:
Figure FDA0002278561080000048
Figure FDA0002278561080000049
8. the internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein S2 extracts video structure information features, and the implementation method is as follows:
firstly, extracting the number N of the shots by adopting a video shot detection algorithmshot(ii) a Then, the following structural information features are extracted: average lens length
Figure FDA0002278561080000051
Abrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shot
Wherein the average lens length
Figure FDA0002278561080000052
The calculation is as follows:
Figure FDA0002278561080000053
l is the total length of the video image;
abrupt lens ratio Rcut_shotThe calculation is as follows:
Figure FDA0002278561080000054
Ncut_shotthe number of the sudden change lenses;
the gradient lens ratio is calculated as:
Rgrandual_shot=1-Rcut_shot(16)
the number N of the shot extracted from the video imageshotAverage lens length
Figure FDA0002278561080000055
Abrupt lens ratio Rcut_shotAnd a progressive lens ratio Rgrandual_shotAnd the front and the back are spliced together to form a 4-dimensional video structure information characteristic description.
CN201710762954.XA 2017-08-30 2017-08-30 Internet video author identity identification method based on multi-feature fusion Active CN107516084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710762954.XA CN107516084B (en) 2017-08-30 2017-08-30 Internet video author identity identification method based on multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710762954.XA CN107516084B (en) 2017-08-30 2017-08-30 Internet video author identity identification method based on multi-feature fusion

Publications (2)

Publication Number Publication Date
CN107516084A CN107516084A (en) 2017-12-26
CN107516084B true CN107516084B (en) 2020-01-17

Family

ID=60723582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710762954.XA Active CN107516084B (en) 2017-08-30 2017-08-30 Internet video author identity identification method based on multi-feature fusion

Country Status (1)

Country Link
CN (1) CN107516084B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657100B (en) * 2019-01-25 2021-10-29 深圳市商汤科技有限公司 Video collection generation method and device, electronic equipment and storage medium
CN110717470B (en) * 2019-10-16 2023-09-26 山东瑞瀚网络科技有限公司 Scene recognition method and device, computer equipment and storage medium
WO2022081127A1 (en) * 2020-10-12 2022-04-21 Hewlett-Packard Development Company, L.P. Document language prediction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469314A (en) * 2016-08-31 2017-03-01 深圳市唯特视科技有限公司 A kind of video image classifier method based on space-time symbiosis binary-flow network
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10229324B2 (en) * 2015-12-24 2019-03-12 Intel Corporation Video summarization using semantic information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469314A (en) * 2016-08-31 2017-03-01 深圳市唯特视科技有限公司 A kind of video image classifier method based on space-time symbiosis binary-flow network
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A New Method for Camera Motion Estimation in Video";Lin Liu;《The 9th International Conference for Young Computer Scientists》;20081231;全文 *
"Detecting Complex Events in User-Generated Video Using Concept Classifiers";Jinlin Guo 等;《2012 10th International Workshop on Content-Based Multimedia Indexing(CBMI)》;20121231;全文 *
"基于运动矢量的摄像机运动定性分类方法";朱兴全 等;《计算机研究与发展》;20010430;第38卷(第4期);全文 *

Also Published As

Publication number Publication date
CN107516084A (en) 2017-12-26

Similar Documents

Publication Publication Date Title
Choi et al. Unsupervised and semi-supervised domain adaptation for action recognition from drones
Guan et al. Keypoint-based keyframe selection
CN105100894B (en) Face automatic labeling method and system
CN103210651B (en) Method and system for video summary
CN108537134B (en) Video semantic scene segmentation and labeling method
CN104994426B (en) Program video identification method and system
US20050228849A1 (en) Intelligent key-frame extraction from a video
Karpenko et al. Tiny videos: a large data set for nonparametric video retrieval and frame classification
WO2010000163A1 (en) Method, system and device for extracting video abstraction
CN101137986A (en) Summarization of audio and/or visual data
CN107516084B (en) Internet video author identity identification method based on multi-feature fusion
WO2013056311A1 (en) Keypoint based keyframe selection
Mademlis et al. Multimodal stereoscopic movie summarization conforming to narrative characteristics
JP5116017B2 (en) Video search method and system
Damnjanovic et al. Event detection and clustering for surveillance video summarization
Jin et al. Network video summarization based on key frame extraction via superpixel segmentation
Tsao et al. Thumbnail image selection for VOD services
Ramya et al. Visual saliency based video summarization: A case study for preview video generation
CN107748761B (en) Method for extracting key frame of video abstract
Widiarto et al. Video summarization using a key frame selection based on shot segmentation
Khan et al. RICAPS: residual inception and cascaded capsule network for broadcast sports video classification
Cricri et al. Multimodal semantics extraction from user-generated videos
Dong et al. Automatic and fast temporal segmentation for personalized news consuming
CN108804981B (en) Moving object detection method based on long-time video sequence background modeling frame
Jiang et al. A scene change detection framework based on deep learning and image matching

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant