CN107516084B

CN107516084B - Internet video author identity identification method based on multi-feature fusion

Info

Publication number: CN107516084B
Application number: CN201710762954.XA
Authority: CN
Inventors: 郭金林; 陈立栋; 白亮; 老松杨
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-08-30
Filing date: 2017-08-30
Publication date: 2020-01-17
Anticipated expiration: 2037-08-30
Also published as: CN107516084A

Abstract

The invention discloses an internet video author identity identification method based on multi-feature fusion, which comprises the following steps: inputting a video, and uniformly downsampling a video frame image; extracting a background camera motion vector from any one of the extracted frame images and a pre-frame image thereof, and further calculating camera motion characteristics in the video image; on the basis of video shot segmentation, calculating the average shot length and the sudden change shot ratio to obtain video structural characteristics; by fusing the characteristics, the support vector machine classifier is used for learning and identifying whether the identity of the video producer is a professional video producer or an amateur video producer. On the basis of fully considering the manufacturing characteristics of professional videos and amateur videos, the invention can accurately learn and identify the identity of an internet video producer as a professional video producer or an amateur video producer by fusing the motion characteristics and the structural characteristics of the camera in the internet videos and utilizing the support vector machine classifier.

Description

Internet video author identity identification method based on multi-feature fusion

Technical Field

The invention relates to the technical field of multimedia communication technology and Internet, in particular to an Internet video author identity identification method based on multi-feature fusion.

Background

Video content is typically produced by professional video production companies, such as television stations. The videos are generally recorded by professionals, are watched and consumed by the public after post-processing, and are generally high in video quality. The videos are generally recorded by professional people through professional camera equipment, subjected to post-processing according to a certain rule, and finally pushed to users through a stable channel to be watched.

The popularity of camera devices (e.g., smartphones with cameras) and the declining price of mass storage devices have resulted in a huge amount of personal video content. The development and popularization of internet technologies are remodeling video consumption modes, especially the rise and popularity of video sharing websites such as YouTube and YouKu, so that internet users can conveniently upload, manage and share videos. In summary, users on the internet are not only consumers of videos, but rather participants and producers of videos.

Hundreds of universal users on the internet are both publishers and consumers of internet videos, which has led to an explosive increase in the number of internet videos. Recent statistics show that on the famous video sharing website YouTube, a user uploads up to 120 hours per minute of video, resulting in a video upload time period of more than one year per day. Videos uploaded by users can be generally classified as professional videos (PPV) or amateur videos (APV) according to the original producer identity of the videos (the uploader of the videos is not necessarily the producer of the videos).

Amateur videos are recorded by amateur personnel who do not have the great experience of making videos using personal camera equipment (such as a mobile phone), undergo little post-making, and are uploaded to a network video by users. In contrast, professional videos are recorded by professional personnel through professional camera equipment and edited according to certain rules, such as news videos and sports videos. It should be noted that many network videos are still considered professional videos by extracting partial segments from professional videos such as television program videos and movies and uploading the partial segments to the network (and some videos are added with subtitles, background music and the like).

Comparing professional videos with amateur videos, the following differences are made:

(1) the number of amateur videos on the internet is increasing at an explosive rate. As no fine post-production is needed, anyone can easily record amateur videos by using a camera or a mobile phone and upload the amateur videos to an internet video sharing website, a large amount of business videos on the internet appear;

(2) because the shooting environment is poor, the shooting equipment is not fine enough, the quality of amateur videos is generally poorer than that of professional videos, such as more irregular camera motion, fuzzy background and the like;

(3) compared with professional videos such as news videos, amateur videos are poor in structure. Like news video can be physically divided into shots, scenes, stories; sports video (in the example of tennis) may be physically divided into discs, offices, etc. The structure of the amateur video is generally not clear enough due to lack of elaboration in the later period;

(4) compared with professional videos, the types of audio accompanied by the amateur videos are more various. The accompanying audio type in professional video is single, such as news video, the audio type is mainly commentary of announcers, and the audio in sports video mainly comprises voice and live cheering of commentaries of commentators.

Disclosure of Invention

With the development of internet technology and the popularization of camera equipment, video content can not be produced only by professional video production companies (such as television stations), and any amateur can also record and release video quickly. The number of professional production videos (abbreviated professional videos) and amateur production videos (abbreviated amateur videos) on the internet is increasing at an explosive rate. In view of the above, the present invention provides an internet video author identity recognition method based on multi-feature fusion, which fuses background camera motion features, video structure information features and accompanying audio features in a video, and learns and recognizes whether a video producer identity is a professional video producer or an amateur video producer by using a support vector machine classifier.

Therefore, the invention adopts the following technical scheme:

an Internet video author identity identification method based on multi-feature fusion comprises the following steps:

s1 extracting camera motion features in video images

S11, the input video image frame is evenly down-sampled, and the whole input video is evenly down-sampled and extracted to obtain M image frames.

S12 extracts background camera motion vectors between adjacent image frames for the M image frames extracted in S11.

S13, calculating a camera motion acceleration set and a motion direction change angle set in the video image according to the background camera motion vector between any three continuous image frames to obtain a camera motion vector set between any two continuous image frames in the whole video

And calculating a camera motion acceleration set A and a motion direction change angle set theta by using the motion vector between any three continuous image frames.

S14 clustering camera motion vectors according to video

Respectively calculating a camera motion vector set by the acceleration set A and the motion direction change angle set theta

And taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera.

S15 fuses the camera motion feature statistics.

All the above statistical features of the motion vector (i.e., the motion vector set calculated in S14) are added

The mean, the second-order center distance, the third-order center distance, and the fourth-order center distance) are spliced together to form an 8-dimensional motion vector feature description. All the above statistical characteristics of the camera motion acceleration (i.e. the camera calculated in S14)The mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion acceleration set) are spliced together, and an 8-dimensional camera motion acceleration characteristic description is also formed. All the statistical characteristics of the motion direction change angles (the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion direction change angle set calculated in the step S14) are spliced together front and back to form a 4-dimensional camera motion direction change characteristic description.

Finally, 8-dimensional motion vector feature description, 8-dimensional camera motion acceleration feature description and 4-dimensional camera motion direction change feature description are horizontally spliced, and a 20-dimensional vector can be obtained for describing camera motion features in the video.

S2 extracts video structure information features.

And S3 video feature fusion.

And (4) splicing the camera motion characteristics in the video image obtained in the step (S1) and the video structure information characteristics extracted in the step (S2) to obtain a 24-dimensional vector for describing video characteristics, wherein the characteristics are input into a classifier to be used for identifying whether the video producer identity is a professional video producer or an amateur video producer.

S4 identifies the video producer identity using a classifier.

In the invention, a Support Vector Machine (SVM) is used as a classifier, and a Gaussian radial basis kernel function is used as a kernel function of the SVM, so that professional videos and amateur videos are classified and identified.

The support vector machine is a new and very potential classification technology proposed in 1995 by AT & T Bell laboratory research group led by Vapnik, and the SVM is a pattern recognition method based on a statistical learning theory. In solving the problems of small samples, nonlinearity and high-dimensional pattern recognition, the SVM has a plurality of specific advantages and can be popularized and applied to other machine learning problems such as classification recognition. After more than thirty years of intensive research, the support vector machine is very mature.

In S11 of the present invention, the implementation manner of performing uniform downsampling on an input video image frame is as follows: 5 image frames are uniformly extracted from each second of video image.

The background camera motion vector between adjacent image frames in the M image frames extracted at S11 is calculated using a block matching algorithm at S12 of the present invention. Specifically, the implementation manner is as follows:

s121, uniformly dividing a current image frame K and a preorder image frame K-1 thereof into a plurality of image blocks with the same size; let the size of the image block be S × S, S being 10 pixels in size.

S122, selecting any image block in the current image frame K as a test image block, wherein S is the size of the test image block so as to test the central point (x) of the image block_t,y_t) A 3S by 3S neighbor search region is selected in its preceding image frame K-1 for the center.

For the test image block at the edge of the current image frame K, a square block with the size of 3S × 3S closest to the corresponding position of the test image block is selected as an adjacent search region in the preceding image frame K-1, and the selection priority of the region in the horizontal direction can be set.

S123: in the adjacent search area, 5 pixels are used as sliding steps to construct a sliding image block with the size of S x S, and an image area similar to the test image block is searched in the sliding image block by calculating the index of the maximum matching pixel number; the maximum matching pixel number index calculation mode is as follows:

wherein (x)_t,y_t) Is the center position of the test image block, (x)_p,y_p) Is the center position, P, of the sliding image block (i.e., the searched image block)_cIs to test the color brightness value, Q, of the pixel points in the image block_cIs the color brightness value of the pixel point in the sliding image block (also called searched image block); (x, y) is the coordinates of the lower left corner of the test image block; testing the central position of an image block and a sliding image blockThe offset of the center position is: d_x＝x_p-x_t，d_y＝y_p-y_tAnd T is a matching threshold.

S124: for any test image block, its center is (x)_t,y_t) And similar image blocks thereof centered at (x)_p,y_p). Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block position_x,v_y) The calculation method is as follows:

v_x＝x'_p-x_t(3)

v_y＝y'_p-y_t(4)

wherein v is_xAnd v_yMotion components of the v motion vector in the X and Y directions, respectively;

preferably, when detecting similar image blocks, the motion vectors v of the detected test image blocks may be unreliable due to the presence of many areas of consistent texture in the image frame. To this end, the present invention checks the number of similar image blocks detected in the adjacent search area. If the number of similar image blocks detected in the adjacent search area exceeds the number of D blocks, and D is 4, it means that the adjacent search area is a consistent texture area, and the detected motion vector v has low reliability, the motion vector of the test image block is set to v 0.

S125: steps S122-S124 are repeated until motion vectors v of all test image blocks in the current image frame K are detected.

S126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K; distributing motion vectors of all tested image blocks in the current image frame K into eight directional intervals, namely [0, pi/4 ], [ pi/4, [ pi/2 ], [ pi/2, 3 pi/4 ], [3 pi/4, [ pi ], [ pi, 5 pi/4 ], [5 pi/4, [3 pi/2 ], [3 pi/2, 7 pi/4 ], [7 pi/4, 2 pi) ] according to the directions of the motion vectors; the test image block in S125 whose motion vector direction falls in the direction interval represented by the largest histogram bin is selected as the background area B.

S127: calculating the background camera motion vector between the current image frame K and the preorder image frame K-1 in the following way:

wherein N is_BIs the number, v, of image blocks in the background area B_tIs the motion vector of the image block t in the background area.

In the invention, in S13, the motion acceleration and the motion direction change angle of the camera are calculated according to the motion vector of the background camera between three continuous image frames, and the calculation method is as follows:

wherein, Δ t is the time interval between two consecutive image frames, since the present invention extracts image frames uniformly (5 frames of images per second), Δ t is a constant; change of acceleration

Equal to the magnitude of the difference in background camera motion vectors between two successive image frames.

Through the calculation, a camera motion vector set between any two continuous image frames in the whole video can be obtained

And calculating the motion acceleration set A of the camera by the motion vector between any three continuous image frames_1,2,3,a_2,3,4,...,a_k-1,k,k+1,...,a_M-2,M-1,M,The set of angles theta with the direction of motion is ═ theta_1,2,3,θ_2,3,4,...,θ_k-1,k,k+1,...,θ_M-2,M-1,M,}。

In the invention S14, camera motion vector set is based on the video

For motion vector setThe mean value is calculated as follows:

for motion vector set

The t-order center distance is calculated as follows:

where t ∈ {2,3,4 }.

For the acceleration set a and the motion direction change angle set Θ, the above statistical information is also calculated, and the mean value of the acceleration set a and the mean value of the direction change angle set Θ are calculated as follows:

for the acceleration set A and the motion direction change angle set theta, the t-order center distance is calculated as follows:

the invention relates to a method for extracting video structure information characteristics by S2, which comprises the following steps:

extracting shot number N by adopting video shot detection algorithm_shot(ii) a Then, on the basis of video shot detection, the following structural information features are extracted: average lens lengthAbrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}。

Wherein the average lens length

The calculation is as follows:

l is the total length of the video image.

Abrupt lens ratio R_{cut_shot}The calculation is as follows:

N_{cut_shot}the number of shots is abruptly changed.

The gradient lens ratio is calculated as:

R_{grandual_shot}＝1-R_{cut_shot}(16)

extracting all the above structure information (i.e. shot number N) from the video image_shotAverage lens length

Abrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}) Are horizontally spliced together to formA 4-dimensional video structure information characterization is shown.

On the basis of fully considering the manufacturing characteristics of professional videos and amateur videos, the invention can accurately learn and identify the identity of an internet video producer as a professional video producer or an amateur video producer by fusing the motion characteristics and the structural characteristics of the camera in the internet videos and utilizing the support vector machine classifier.

Experiments prove that the internet video author identity identification method based on multi-feature fusion can effectively distinguish the identity of a video producer as a professional video producer or an amateur video producer. In particular, when the professional videos are news, sports videos, television advertisement videos, and music videos, and the amateur videos are self-made videos of internet users, the implementation provided by the present invention can accurately distinguish whether the video producer identity is a professional video producer or an amateur video producer.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the present invention for extracting camera motion characteristics in video images

FIG. 3 is a schematic diagram of an image block search and matching method according to the present invention;

FIG. 4 is a diagram illustrating the distribution of motion vector directions of image blocks in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of the internet video author identification method based on multi-feature fusion according to the present invention includes the following steps:

s1, extracting camera motion characteristics in the video image; fig. 2 is a flowchart of the present invention for extracting the camera motion feature in the video image. In this embodiment, a block matching algorithm is used to calculate a background camera motion vector between two extracted adjacent frames of images: firstly, a frame of image is divided into a plurality of image blocks with the same size, and then the position of each data block of the current frame in the previous frame is searched. Fig. 3 illustrates a schematic diagram of the image block search matching method of the present invention.

In the embodiment, an image frame after down-sampling is divided into a plurality of image blocks with the size of 10 × 10 pixels, and the image blocks are used as a unit to calculate the motion vector of the background camera, so that the matching efficiency is improved and a certain precision can be ensured.

The input video image is uniformly downsampled, and the number of image frames needing to be processed can be reduced through downsampling of the video image frames, so that the calculated amount is effectively reduced. The specific implementation mode is that 5 frames of images are uniformly extracted from each second of video, and M frames of image frames are obtained by extracting the whole input video.

S123: and in the adjacent search area, constructing a sliding image block with the size of S x S by taking 5 pixels as a sliding step, and searching an image area similar to the test image block in the sliding image block by calculating the index of the maximum matched pixel number. The maximum matching pixel number index calculation mode is as follows:

wherein (x)_t,y_t) Is the center position of the test image block, (x)_p,y_p) Is the center position, P, of the sliding image block (i.e., the searched image block)_cIs to test the color brightness value, Q, of the pixel points in the image block_cIs the color brightness value of the pixel point in the sliding image block (also called searched image block); (x, y) is the coordinates of the lower left corner of the test image block; the offset between the center position of the test image block and the center position of the sliding image block is as follows: d_x＝x_p-x_t，d_y＝y_p-y_tAnd T is a matching threshold.

S124: motion vectors for the test image blocks are calculated.

For any test image block, its center is (x)_t,y_t) And similar image blocks thereof centered at (x)_p,y_p). Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block position_x,v_y) The calculation method is as follows:

v_x＝x'_p-x_t(3)

v_y＝y'_p-y_t(4)

S126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K;

specifically, the motion vectors of all the test image blocks in the current image frame K are allocated to eight directional sections, i.e., [0, pi/4), [ pi/4, pi/2), [ pi/2, 3 pi/4), [3 pi/4, pi), [ pi, 5 pi/4), [5 pi/4, 3 pi/2), [3 pi/2, 7 pi/4), [7 pi/4, 2 pi) according to their directions, see fig. 4. The test image block in S125 whose motion vector direction falls in the direction interval represented by the largest histogram bin is selected as the background area B.

S13 calculating the camera motion acceleration and the motion direction change angle between any three continuous image frames

According to the background camera motion vector between any three continuous image frames, the motion acceleration and the motion direction change angle of the camera in the video image can be calculated, and the calculation method comprises the following steps:

where Δ t is the time interval between two consecutive image frames, since the present invention extracts image frames uniformly (5 images per second), Δ t is one image frameA constant; change of accelerationEqual to the difference of two consecutive motion vectors.

S14, respectively calculating the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the camera motion acceleration set and the motion direction change angle set according to the camera motion acceleration set and the motion direction change angle set in the video image, and taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance as the camera motion characteristics.

Camera motion vector set from video images

For motion vector setThe mean value is calculated as follows:

for motion vector set

The t-order center distance is calculated as follows:

where t ∈ {2,3,4 }.

s15 fuses the camera motion feature statistics.

The mean, the second-order center distance, the third-order center distance, and the fourth-order center distance) are spliced together to form an 8-dimensional motion vector feature description. All the above statistical characteristics of the camera motion acceleration (i.e. the mean, the second-order center distance and the third-order center of the camera motion acceleration set calculated in S14)Distance and fourth-order center distance) are spliced together front and back, and an 8-dimensional camera motion acceleration characteristic description is also formed. All the statistical characteristics of the motion direction change angles (the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the motion direction change angle set calculated in the step S14) are spliced together front and back to form a 4-dimensional camera motion direction change characteristic description.

Finally, 8-dimensional motion vector feature description, 8-dimensional camera motion acceleration feature description and 4-dimensional camera motion direction change feature description are horizontally spliced, and a 20-dimensional vector can be obtained to describe the camera motion features in the video clip.

S2 extracts video structure information features.

Compared with professional videos, amateur videos are poor in structurality. The structure or time sequence information in the professional video is strong, for example, the visual rhythm in the advertisement or music video is fast, and the average duration of the shot is short. Advertising videos often also contain more fade shots. Therefore, the video structure information can effectively distinguish the amateur video from the professional video.

Video shot detection is a very mature technology which has been researched, and the present embodiment can adopt a mature shot detection algorithm, and the shot detection accuracy can reach more than 90%.

In the embodiment, the video shot detection algorithm is adopted to extract the shot number N_shot(ii) a Then, on the basis of video shot detection, the following structural information features are extracted: average lens length

Abrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}。

Wherein the average lens length

The calculation is as follows:

l is the total length of the video image.

Abrupt lens ratio R_{cut_shot}The calculation is as follows:

N_{cut_shot}the number of shots is abruptly changed.

The gradient lens ratio is calculated as:

R_{grandual_shot}＝1-R_{cut_shot}(16)

Abrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}) And the front and the back are spliced together to form a 4-dimensional video structure information characteristic description.

And S3 video feature fusion.

The S4 classifier identifies the video producer identity.

Experiments prove that the embodiment provided by the invention can effectively distinguish whether the identity of a video producer is a professional video producer or an amateur video producer. In particular, when the professional videos are news, sports videos, television advertisement videos, and music videos, and the amateur videos are self-made videos of internet users, the implementation provided by the present invention can accurately distinguish whether the video producer identity is a professional video producer or an amateur video producer.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An Internet video author identity recognition method based on multi-feature fusion is characterized by comprising the following steps:

s1, extracting camera motion characteristics in the video image;

s11, carrying out uniform down-sampling on the input video image frame, and extracting the whole input video through uniform down-sampling to obtain M image frames;

s12 extracting background camera motion vectors between adjacent image frames for the M image frames extracted in S11;

Calculating a camera motion acceleration set A and a motion direction change angle set theta by using a motion vector between any three continuous image frames;

s14 clustering camera motion vectors according to video

Taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as camera motion characteristics;

s15 fusing camera motion characteristic statistical information;

the motion vector set calculated in S14 is collected

All the camera motion characteristics are spliced together front and back to form 8-dimensional motion vector characteristic description; splicing all the camera motion characteristics of the camera motion acceleration set A obtained by calculation in the S14 front and back together to form an 8-dimensional camera motion acceleration characteristic description; splicing all the camera motion characteristics of the motion direction change angle set theta calculated in the S14 together front and back to form a 4-dimensional camera motion direction change characteristic description;

finally, horizontally splicing the 8-dimensional motion vector feature description, the 8-dimensional camera motion acceleration feature description and the 4-dimensional camera motion direction change feature description to obtain a 20-dimensional vector for describing camera motion features in a video image;

s2 extracting video structure information characteristics;

s3 video feature fusion;

splicing the camera motion characteristics in the video image obtained in the step S1 and the video structure information characteristics extracted in the step S2 to obtain a 24-dimensional vector for describing video characteristics, wherein the characteristics are input into a classifier to be used for identifying whether the video producer identity is a professional video producer or an amateur video producer;

s4 identifies the video producer identity using a classifier.

2. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein in S11, the uniform down-sampling of the input video image frames is implemented by: 5 image frames are uniformly extracted from each second of video image.

3. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein the block matching algorithm is used in S12 to calculate the background camera motion vector between adjacent image frames in the M image frames extracted in S11.

4. The internet video authorship identification method based on multi-feature fusion as claimed in claim 1, wherein the implementation method of S12 is as follows:

s121, uniformly dividing a current image frame K and a preorder image frame K-1 thereof into a plurality of image blocks with the same size; setting the size of the image block as S, wherein S is 10 pixels;

s122, selecting any image block in the current image frame K as a test image block, wherein S is the size of the test image block so as to test the central point (x) of the image block_t,y_t) Selecting a near search area with the size of 3S by 3S in a preamble image frame K-1 for the center;

for the test image block at the edge of the current image frame K, selecting a square block with the size of 3S × 3S closest to the corresponding position of the test image block in the front image frame K-1 as an adjacent search area, and setting the selection priority of the area in the horizontal direction;

wherein (x)_t,y_t) Is the center position of the test image block, (x)_p,y_p) Is the center position, P, of the sliding image block_cIs to test the color brightness value, Q, of the pixel points in the image block_cThe color brightness value of a pixel point in the sliding image block; (x, y) is the coordinates of the lower left corner of the test image block; the offset between the center position of the test image block and the center position of the sliding image block is as follows: d_x＝x_p-x_t，d_y＝y_p-y_tT is a matching threshold;

s124: for any test image block, its center is (x)_t,y_t) And similar image blocks thereof centered at (x)_p,y_p) (ii) a Calculating the motion vector v ═ of the test image block according to the test image block and its similar image block position_x,v_y) The calculation method is as follows:

v_x＝x'_p-x_t(3)

v_y＝y'_p-y_t(4)

s125: repeating steps S122-S124 until the motion vectors v of all the test image blocks in the current image frame K are detected;

s126: counting direction distribution histograms of motion vectors v of all tested image blocks in a current image frame K; distributing motion vectors of all tested image blocks in the current image frame K into eight directional intervals, namely [0, pi/4 ], [ pi/4, [ pi/2 ], [ pi/2, 3 pi/4 ], [3 pi/4, [ pi ], [ pi, 5 pi/4 ], [5 pi/4, [3 pi/2 ], [3 pi/2, 7 pi/4 ], [7 pi/4, 2 pi) ] according to the directions of the motion vectors; selecting a test image block of which the motion vector direction falls in the direction interval represented by the maximum histogram bar in the S125 as a background area B;

5. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 4, wherein the step S124 further includes checking the number of similar image blocks detected in the neighboring search area, and if the number of similar image blocks detected in the neighboring search area exceeds D blocks, where D is 4, it means that the neighboring search area is a consistent texture area, and the detected motion vector v is less reliable, and the motion vector of the test image block is set to v 0.

6. The internet video authorship recognition method based on multi-feature fusion as claimed in claim 4 or 5, wherein in step S13, the camera motion acceleration and the motion direction change angle are calculated according to the background camera motion vector between three consecutive image frames, and the calculation method is as follows:

wherein Δ t is a time interval between two consecutive extracted image frames, Δ t being a constant; change of acceleration

A magnitude equal to a difference in background camera motion vectors between two consecutive image frames;

through the calculation, the whole video can be obtainedSet of camera motion vectors between any two consecutive image frames

7. The Internet video authorship recognition method based on multi-feature fusion as claimed in claim 6, wherein in S14, a set of camera motion vectors is calculated

Taking the mean value, the second-order center distance, the third-order center distance and the fourth-order center distance of the acceleration set A and the direction change angle set theta as the motion characteristics of the camera, wherein the method comprises the following steps:

for motion vector set

The mean value is calculated as follows:

for motion vector setThe t-order center distance is calculated as follows:

wherein t ∈ {2,3,4 };

for the acceleration set a and the motion direction change angle set Θ, the mean of the acceleration set a and the mean of the direction change angle set Θ, the calculation is as follows:

8. the internet video authorship recognition method based on multi-feature fusion as claimed in claim 1, wherein S2 extracts video structure information features, and the implementation method is as follows:

firstly, extracting the number N of the shots by adopting a video shot detection algorithm_shot(ii) a Then, the following structural information features are extracted: average lens length

Abrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}；

Wherein the average lens length

The calculation is as follows:

l is the total length of the video image;

abrupt lens ratio R_{cut_shot}The calculation is as follows:

N_{cut_shot}the number of the sudden change lenses;

the gradient lens ratio is calculated as:

R_{grandual_shot}＝1-R_{cut_shot}(16)

the number N of the shot extracted from the video image_shotAverage lens length

Abrupt lens ratio R_{cut_shot}And a progressive lens ratio R_{grandual_shot}And the front and the back are spliced together to form a 4-dimensional video structure information characteristic description.