WO2019184522A1 - 一种重复视频的判断方法及装置 - Google Patents

一种重复视频的判断方法及装置 Download PDF

Info

Publication number
WO2019184522A1
WO2019184522A1 PCT/CN2018/125500 CN2018125500W WO2019184522A1 WO 2019184522 A1 WO2019184522 A1 WO 2019184522A1 CN 2018125500 W CN2018125500 W CN 2018125500W WO 2019184522 A1 WO2019184522 A1 WO 2019184522A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
features
determining
feature
image
Prior art date
Application number
PCT/CN2018/125500
Other languages
English (en)
French (fr)
Inventor
何轶
李磊
杨成
李�根
李亦锬
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to JP2019572032A priority Critical patent/JP7000468B2/ja
Priority to US16/958,513 priority patent/US11265598B2/en
Priority to SG11201914063RA priority patent/SG11201914063RA/en
Publication of WO2019184522A1 publication Critical patent/WO2019184522A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Definitions

  • the present disclosure relates to the field of video processing technologies, and in particular, to a method and an apparatus for determining repeated video.
  • the problem videos mainly include: and existing videos in the platform video database. Duplicate videos, videos that are duplicated with videos in the copyright database (for example, videos that require royalties), and some videos that are inappropriate or prohibited. Therefore, it is necessary to quickly compare and eliminate the massive video uploaded by the user.
  • the method for judging the repeated video according to the present disclosure includes the following steps: acquiring a plurality of video features of the to-be-checked video; and performing sequence comparison on the plurality of existing videos according to the plurality of video features of the to-be-checked video And obtaining a sequence alignment result; performing, according to the sequence comparison result, the first ranking of the plurality of existing videos, and extracting the first n existing videos as the first candidate according to the result of the first ranking a video, where n is a positive integer; determining a repetition of the to-be-checked video according to the sequence alignment result of the first candidate video.
  • the foregoing method for determining a repeated video wherein the acquiring the plurality of video features of the to-be-checked video comprises: performing frame drawing on the to-be-checked video to obtain a plurality of frame images of the to-be-checked video; Extracting a plurality of image features of the frame image as a first image feature; determining, according to the same first image feature of the plurality of frame images of the to-be-checked video, the video feature of the to-be-checked video as a first video feature to obtain a plurality of said first video features.
  • the foregoing method for judging a repeated video wherein the extracting the plurality of image features of the frame image comprises: acquiring, for each of the frame images, one or more detection vectors, using each of the detection vectors, Determining, at an arbitrary pixel in the frame image, an end point of the starting point pointed by the detection vector, determining the frame image according to an overall situation of a difference between each of the starting point and the corresponding end point Image features as a fence feature.
  • the foregoing method for judging a repeated video wherein the extracting the plurality of image features of the frame image comprises: performing, for each of the frame images, a plurality of types of pooling to obtain the frame image Image features, as pooling features; wherein the plurality of types of pooling include maximum pooling, minimum pooling, and average pooling.
  • the method for judging the repeated video wherein the plurality of existing video features are sequentially compared according to the plurality of video features of the to-be-checked video, and the sequence comparison result includes: acquiring one of the Having a plurality of video features of the video as a second video feature, each of the second video features comprising a plurality of second image features; determining each of the second image features of the same species and each of the first images a monomer similarity between features to obtain a plurality of said monomer similarities; determining an average or minimum value of said plurality of monomer similarities, based on an average or minimum of said plurality of monomer similarities The value determines a similarity matrix of the existing video; determining a sequence alignment score according to the similarity matrix, wherein the sequence alignment score is used to indicate the degree of similarity between the existing video and the to-be-checked heavy video.
  • the foregoing method for determining a repeated video wherein the determining the sequence alignment score according to the similarity matrix comprises: determining a sequence alignment score according to a straight line in the similarity matrix.
  • the method for judging the repeated video wherein the plurality of existing video features are sequentially compared according to the plurality of video features of the to-be-checked video, and the sequence comparison result further includes: according to the similarity The degree matrix determines the repeated video segments of the existing video and the to-be-checked video.
  • the foregoing method for judging a repeated video wherein the plurality of existing video features are sequentially compared according to the plurality of video features of the to-be-checked video, and the sequence comparison result is obtained according to at least one Determining, by each of the first video features, a second ranking of the plurality of existing videos, and extracting the first k existing videos as the second candidate video according to the result of the second ranking Where k is a positive integer, and each of the second candidate videos is subjected to sequence alignment, respectively, to obtain a sequence alignment result.
  • the foregoing method for determining a repeated video wherein the performing the second ranking of the plurality of existing videos according to each of the individual first image features of the at least one of the first video features comprises: at least one Each individual first image feature of the first video feature serves as an index request for word frequency-reverse file frequency ranking for a plurality of existing videos.
  • the foregoing method for determining a repeated video wherein the determining, according to each of the first image features of the plurality of frame images of the to-be-checked video, the plurality of video features of the to-be-checked video as the first A video feature includes: performing binarization processing on the first image feature; determining the first video feature according to the binarized first image feature of the plurality of frame images.
  • the determining device of the repeated video according to the present disclosure includes: a video feature acquiring module, configured to acquire multiple types of video features of the to-be-checked video; and a sequence matching module, configured to use according to the plurality of to-be-checked videos a video feature, performing sequence alignment on a plurality of existing videos to obtain a sequence comparison result; and a first ranking module, configured to first rank the plurality of existing videos according to the sequence comparison result, according to As a result of the first ranking, the first n existing videos are taken as the first candidate video, where n is a positive integer; and the check module is configured to compare the results according to the sequence of the first candidate video. Determining the repetition of the to-be-checked video.
  • the object of the present disclosure can also be further achieved by the following technical measures.
  • the foregoing repeating video judging device further includes means for performing the judging method steps of any of the foregoing repeated videos.
  • a determining hardware device for repeating video comprising: a memory for storing non-transitory computer readable instructions; and a processor for executing the computer readable instructions such that when the processor executes A method of determining any of the foregoing repeated videos is implemented.
  • a computer readable storage medium for storing non-transitory computer readable instructions, when the non-transitory computer readable instructions are executed by a computer, causing the computer to perform any of the foregoing repeated videos The method of judgment.
  • a terminal device includes any of the foregoing judging devices for repeating video.
  • FIG. 1 is a block flow diagram of a method for determining a repeated video according to an embodiment of the present disclosure.
  • FIG. 2 is a flow chart of acquiring video features of a to-be-checked video according to an embodiment of the present disclosure.
  • FIG. 3 is a block diagram of a process for extracting fence features according to an embodiment of the present disclosure.
  • FIG. 4 is a block diagram of an extraction pooling feature provided by an embodiment of the present disclosure.
  • FIG. 5 is a block diagram of a process for binarizing image features by using a random projection method according to an embodiment of the present disclosure.
  • FIG. 6 is a block diagram of a sequence alignment provided by an embodiment of the present disclosure.
  • FIG. 7 is a flow chart of a sequence alignment using dynamic programming method according to an embodiment of the present disclosure.
  • FIG. 8 is a flow chart of sequence alignment using a uniform video method according to an embodiment of the present disclosure.
  • FIG. 9 is a flow chart of a second ranking provided by an embodiment of the present disclosure.
  • FIG. 10 is a block diagram showing the structure of a judging device for repeating video according to an embodiment of the present disclosure.
  • FIG. 11 is a structural block diagram of a video feature acquiring module according to an embodiment of the present disclosure.
  • FIG. 12 is a structural block diagram of a sequence comparison module according to an embodiment of the present disclosure.
  • FIG. 13 is a hardware block diagram of a determination hardware device for repeating video according to an embodiment of the present disclosure.
  • FIG. 14 is a schematic diagram of a computer readable storage medium in accordance with an embodiment of the present disclosure.
  • Figure 15 is a block diagram showing the structure of a terminal device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flow chart of an embodiment of a method for determining a repeated video according to the present disclosure.
  • a method for determining a repeated video according to an example of the present disclosure mainly includes the following steps:
  • step S11 multiple video features of the Query Video are obtained.
  • the video mentioned here can be a video signal or a video file. It is possible to define the video feature of the video to be checked as the first video feature. Thereafter, the process proceeds to step S12.
  • Step S12 Perform sequence alignment on each of the plurality of existing videos according to the plurality of the first video features of the to-be-checked video to obtain a sequence comparison result of each existing video.
  • the sequence alignment result includes a sequence alignment score for expressing the degree of similarity of the existing video with the to-be-checked video and/or a video segment of the existing video that is repeated with the to-be-checked video.
  • the existing video is a video in a video database. Thereafter, the processing proceeds to step S13.
  • Step S13 Perform a first ranking on the plurality of existing videos according to the result of the sequence comparison, and take the first n existing videos in the first ranking result as the first candidate video according to the result of the first ranking, where n is a positive integer. Thereafter, the process proceeds to step S14.
  • Step S14 Determine, according to the sequence comparison result of the first candidate video, a repetition condition of the to-be-checked video. For example, determining whether the to-be-checked video is a repeated video (can be determined by manual comparison, or by presetting a threshold of a sequence alignment score, and according to the sequence comparison of the first candidate video, whether the score is higher than the threshold. Determining), determining which one or more existing videos are repeated, and determining specific repeated video segments, thereby filtering the repeated video.
  • the accuracy and efficiency of judging the repeated video can be greatly improved.
  • FIG. 2 is a schematic block diagram of acquiring video features of a to-be-checked video according to an embodiment of the present disclosure.
  • step S11 in the example of the present disclosure includes the following steps:
  • Step S21 Perform sampling and frame drawing on the weight-checked video to obtain a plurality of frame images of the to-be-checked video.
  • the image sequence consists of the plurality of frame images.
  • the specific number of extracted frame images may be set, for example, two frame images may be extracted from the video per second, or one frame image may be extracted from the video per second.
  • the frame drawing can be performed uniformly, that is, the time interval between two adjacent frame images is consistent. .
  • Step S22 extracting a plurality of image features of each frame image, and defining an image feature of the video to be checked as the first image feature.
  • Step S23 Determine, according to each first image feature of the same kind of the plurality of frame images of the to-be-checked video, the first video feature of the to-be-checked video, thereby obtaining a plurality of first video features.
  • the plurality of first image features may be arranged according to an order of the corresponding plurality of frame images in the video (that is, an order in the image sequence) to obtain the first video feature.
  • the method for extracting image features in step S22 and the type of the obtained first image features are not limited, and the extraction of the first image features by using various methods may be utilized.
  • the extracted first image feature may be a floating point feature or a binarized feature.
  • the video feature of the existing video is recorded in the video database (the video feature of the existing video may be defined as the second video feature, and the second video feature is composed of the plurality of second image features), and
  • the video database includes a second video feature of the same type as the first video feature extracted using the same method such that a comparison of the first video feature and the second video feature of the same type can be performed during the video feature alignment process.
  • video features of the existing video can be obtained as described above.
  • a fence feature (which may also be referred to as a Fence feature or a Recall feature) is included among the plurality of image features extracted in step S22.
  • the method for extracting the fence feature of the frame image is: for each frame image, one or more detection vectors are acquired, and each detection vector is used to determine an end point of the detection vector pointing by using any pixel in one frame image as a starting point, and determining each For the difference between the start point and the end point, the image features of the frame image are determined according to the overall situation of the difference between the start and end points of each pair, and such image features are defined as fence features.
  • the so-called arbitrary pixel as the starting point is: generally all the pixels in the frame image can be defined as a starting point; or one or more preset positions in the frame image can also be defined as a starting point, and specific The position is arbitrary, for example, all points in a frame image that are not on the edge can be taken as a starting point.
  • FIG. 3 is a schematic block diagram of an extraction fence feature provided by an embodiment of the present disclosure. Since the image features can be acquired according to the method shown in FIG. 3 for any video, it is not distinguished whether or not the video is to be checked in the description of the embodiment.
  • step S22 in the example of the present disclosure may include the following steps:
  • step S31 one or more shift vectors are acquired. It may be assumed that the number of detection vectors acquired is N, where N is a positive integer. Specifically, the plurality of detection vectors may be preset or randomly generated. Further, the length and direction of each detection vector are arbitrary. In addition, the individual detection vectors are independent and do not require any association. It is worth noting that for a plurality of frame images obtained by frame extraction, the same set of detection vectors can generally be used to determine image features of each frame image, but different sets of detection vectors can also be used to determine image features of each frame image separately. . Thereafter, the processing proceeds to step S32.
  • Step S32 according to a detection vector, using each pixel in the frame image as a starting point, determining a pixel of the starting point of the starting point pointed by the detection vector, according to the overall situation of the difference between each starting pixel and the corresponding ending pixel Determining a feature bit of each frame based on each detection vector. Thereafter, the process proceeds to step S33.
  • Step S33 respectively determining feature bits corresponding to each detection vector, and determining a fence feature corresponding to the frame image according to the obtained N pieces of the feature bits.
  • step S32 includes: allocating a counter for a detection vector; counting the difference in brightness of each pair of start points and end points, increasing or decreasing the value of the counter, if the brightness value of the starting point is greater than the brightness value of the end point, Then, the value of the counter is +1.
  • the value of the counter is -1; determining whether the value of the counter is greater than a preset setting value (for example, the setting value can be preset to 0) If the value of the counter is greater than the set value, a feature bit with a value of 1 is generated, and a feature bit with a value of 0 is generated.
  • a preset setting value for example, the setting value can be preset to 0
  • the value of the counter may not be changed, or the frame image may be periodically extended, and the frame image is set in the same manner as the frame image.
  • the frame image is such that there must be a corresponding pixel at the end of the detection vector.
  • the start point pixel and the end point pixel in the frame image are determined according to the detection vector of the length and the direction, and the difference between the start point pixel and the end point pixel is compared to generate the feature of the frame image, which can improve the accuracy and extraction efficiency of the video feature extraction. And can improve the excellent degree of the obtained video features, so that the video check based on the fence feature has higher accuracy and efficiency.
  • a pooling feature (which may also be referred to as a Pooling feature or a Reranking feature) is included.
  • the method for extracting the pooled features of the frame image is as follows: for each frame image, various types of pooling processing are performed step by step to obtain image features of the frame image, and the image features are defined as pooling. feature.
  • Pooling is a dimensionality reduction method in the field of convolutional neural networks, and so-called multiple types of pooling include maximum pooling, minimum pooling, and average pooling.
  • various types of pooling may be performed step by step based on a plurality of color channels of the frame image to obtain image features according to multiple color channels of the frame image.
  • performing multiple types of pooling on the frame image step by step includes: determining a matrix according to the frame image, and using a plurality of types of pooling to generate a smaller matrix step by step until the reduction to one includes only one
  • the matrix of points or, alternatively, the "points" in the matrix can be referred to as "elements" in the matrix
  • the pooled features of the frame image are determined from the matrix containing only one point.
  • FIG. 4 is a schematic block diagram of an extraction pooling feature provided by an embodiment of the present disclosure. Since the image features can be acquired according to the method shown in FIG. 4 for any video, it is not distinguished whether or not the video is to be checked in the description of the embodiment. Referring to FIG. 4, in an embodiment of the present disclosure, step S22 in the example of the present disclosure may include the following steps:
  • Step S41 determining a first matrix having a first matrix dimension and a second matrix dimension (or having a length direction and a width direction) according to a frame image. It may be assumed that the length of the frame image is x pixels and the width is y pixels, where x and y are positive integers.
  • a point in the first matrix (the points in the matrix can also be referred to as elements in the matrix, but in order to distinguish from the elements in the vector, the elements in the matrix are referred to as "points" below) corresponding to the frame image a pixel in the first matrix, such that the length of the first matrix dimension is x, and the length of the second matrix dimension is y (ie, x*y matrix); the first matrix dimension of the matrix herein is /
  • the length of the second matrix dimension is used to represent the number of points that the matrix contains on the first matrix dimension/second matrix dimension.
  • the value of each point in the first matrix is a 3-dimensional vector, and the 3-dimensional vector is defined as a first vector, which is used to represent three color channels of corresponding pixels in the frame image. Brightness.
  • the color mode of the video object is red, green and blue mode (RGB mode)
  • RGB mode red, green and blue mode
  • three color channels of red, green and blue are not necessarily taken, for example, It can be selected according to the color mode used by the video object; even the number of selected color channels does not have to be three. For example, two of the three color channels of red, green and blue can be selected. Thereafter, the processing proceeds to step S42.
  • Step S42 a plurality of first blocks are set on the first matrix (in fact, each block is equivalent to a pooled window, so the first block may also be referred to as a first pooled window), and may be set to x 1 *y 1 first block, where x 1 and y 1 are positive integers, each first block contains a plurality of points of the first matrix (or, more than, a plurality of first vectors);
  • the number of the first block in the first matrix dimension is less than the length of the first matrix dimension of the first matrix (or less than the number of points included in the first matrix dimension of the first matrix), and
  • the number of the plurality of first blocks in the second matrix dimension is less than the length of the second matrix dimension of the first matrix (or less than the number of points included in the second matrix dimension of the first matrix) ), that is, the value of x 1 is less than x, and the value of y 1 is less than y.
  • each first block respectively calculating a maximum value, a minimum value, and an average value of each dimension of the plurality of first vectors included in the first block, to obtain a 9-dimensional vector corresponding to the first block, This 9-dimensional vector is defined as the second vector.
  • each of the first blocks may partially overlap each other, that is, may or may not overlap each other. Thereafter, the processing proceeds to step S43.
  • the first matrix dimension of the first matrix may be uniformly divided into x 1 segments, each segment having the same length, and the adjacent two segments contain the same point (partial overlap)
  • the second matrix dimension of the first matrix is divided into y 1 segments, and the x 1 segment is combined with the y 1 segment to obtain x 1 *y 1 first blocks of the first matrix.
  • each of the first blocks provided has the same size and the same pitch (the two adjacent first blocks may overlap)
  • the foregoing plurality of first areas are disposed on the first matrix.
  • the process of calculating and calculating the second vector of each first block is actually equivalent to scanning (or traversing) the entire first matrix at a certain interval with a pooling window, and calculating the pooling in each scan.
  • the second vector of the area covered by the window is actually equivalent to scanning (or traversing) the entire first matrix at a certain interval with a pooling window, and calculating the pooling in each scan.
  • Step S43 determining a second matrix according to the plurality of x 1 *y 1 first blocks and a second vector corresponding to each first block; a point in the second matrix corresponds to a first block,
  • the second matrix is a matrix with a length of x 1 in the first matrix dimension and a length y 1 in the second matrix dimension (ie, x 1 *y 1 matrix
  • the value of each point in the second matrix is the second vector of the corresponding first block.
  • each point in the second matrix may be arranged in the order of the positions of the respective first blocks in the first matrix.
  • Step S44 repeating step S42 and step S43: according to the second matrix containing x 1 * y 1 points and the value of each point is a 9-dimensional vector, obtaining 2 points including x 2 * y and taking each point a third matrix with a value of 27-dimensional vector (where x 2 is a positive integer less than x 1 and y 2 is a positive integer less than y 1 ); and then according to x 2 *y 2 points and the value of each point For the third matrix of the 27-dimensional vector, a third matrix containing x 3 *y 3 points and each point having a value of 81-dimensional vector is obtained (where x 3 is a positive integer less than x 2 and y 3 is less than a positive integer of y 2 );...; until the first matrix (or the frame image) is reduced to a 1*1 N-th matrix (in fact, the matrix is reduced to a point), Where N is a positive integer, the Nth matrix includes only one point, and the value of the point is a 3 N
  • step S44 in each process of setting a block, the block should be set according to the size of the matrix in a corresponding manner to adapt to the first matrix dimension and the second matrix dimension of the matrix. The level is reduced.
  • the accuracy and extraction efficiency of video feature extraction can be improved, and the quality and robustness of the obtained video features can be improved, and further
  • the video check based on the pooled feature has higher accuracy and efficiency.
  • the present disclosure may further include the following Step: Perform binarization processing on the image features determined in step S22 to obtain a binarized image feature, the binarized image feature being a bit string composed of 0/1. Then, based on the obtained binarized image features, the video features of the video object are determined.
  • FIG. 5 is a schematic block diagram of binarizing image features by using a random projection method according to an embodiment of the present disclosure.
  • the method for determining a repeated video of the example of the present disclosure may further include the following steps of performing binarization processing on image features by using a random projection method:
  • Step S51 in order to generate a binarized image feature of length h, generate 2h groups according to one image feature, each group containing a plurality of elements in the image feature (that is, each group contains image features The value of multiple dimensions). Where h is a positive integer. Thereafter, the processing proceeds to step S52.
  • each group specifically includes which elements are arbitrary, and two different groups may include some of the same elements. However, in order to facilitate video matching, each group contains specific elements that can be preset, or the same way can be used to generate the group for multiple videos.
  • the number of elements included in each group is the same. However, it should be noted that the number of elements included in each group can be different.
  • step S52 a plurality of elements included in each group are respectively summed to obtain an added value of each group. Thereafter, the processing proceeds to step S53.
  • step S53 the 2h groups are paired two to two to obtain h group pairs. Thereafter, the processing proceeds to step S54.
  • 2h group numbers can be numbered in advance (or the group can be sorted), and the adjacent two groups can be paired.
  • step S54 each group pair is compared, the sum of the two groups in each group is compared, and a binarized image feature bit is generated according to the comparison result. Thereafter, the processing proceeds to step S55.
  • a binarized image with a value of 1 is generated in a pair of groups.
  • the feature bit otherwise, generates a binarized image feature bit with a value of zero.
  • the method of generating the binarized image feature bits is not limited. For example, when the sum value of the group with a small number is smaller than the sum of the groups with the large number, a binarized image with a value of 1 may be generated. Feature bits.
  • Step S55 forming a binarized image feature of length h according to the h binarized image feature bits of the h group pairs.
  • step S12 and step S13 are identical to step S12 and step S13.
  • FIG. 6 is a schematic flow chart of sequence alignment provided by an embodiment of the present disclosure.
  • step S12 in the example of the present disclosure may include the following steps:
  • Step S61 Acquire a plurality of video features of an existing video. It may be desirable to define video features of an existing video as second video features, each second video feature comprising a plurality of second image features. Thereafter, the process proceeds to step S62.
  • the aforementioned fence feature and pooling feature of the to-be-checked video and the existing video may be acquired simultaneously, and/or the aforementioned floating point feature and binarization feature may be simultaneously acquired.
  • Step S62 determining, for each of the plurality of second video features and the plurality of first video features, each of the second image features and each of the first video features of the same first video feature The similarity between the monomers to obtain a variety of monomer similarities.
  • Each monomer similarity is used to indicate the degree of similarity between a first image feature and a second image feature. Specifically, the greater the similarity of the cells, the more similar. Thereafter, the processing proceeds to step S63.
  • the length of the first video feature of the to-be-checked video and the length of the second video feature of the existing video are M 1 and M 2 , respectively, where M 1 and M 2 are positive integers, that is, the first video
  • the feature includes M 1 first image features and the second video feature includes M 2 second image features.
  • M 1 *M 2 monomer similarities can be obtained between the first video feature and the second video feature of the same kind.
  • a distance or metric capable of determining the degree of similarity of the first and second image features may be selected as the single similarity according to the type of the image feature.
  • the single similarity may be determined according to a cosine distance between the first image feature and the second image (or called cosine similarity); The cosine distance can be directly determined as the monomer similarity.
  • the single cell similarity may be determined according to a Hamming distance between the first image feature and the second image feature. Specifically, first calculating a Hamming distance between the first and second image features, calculating a difference between the length of the image feature and the Hamming distance, and determining a ratio of the difference to the feature length of the image as a single The similarity is used to represent the proportion of the same bits in the binarized first and second image features.
  • the Hamming distance is a commonly used metric in the field of information theory.
  • the Hamming distance between two equal-length strings is the number of different characters corresponding to the positions of the two strings. It should be noted that the image features extracted by the same method generally have the same length.
  • Step S63 in one example, determining a minimum value of the plurality of monomer similarities, determining a similarity matrix of the existing video according to an average of the plurality of monomer similarities; or in another example And determining a minimum value of the plurality of monomer similarities, and determining a similarity matrix of the existing video according to the minimum value of the plurality of monomer similarities.
  • each point in the similarity matrix corresponds to a single cell similarity such that the similarity matrix records a single similarity between each second image feature of the existing video and each of the first image features.
  • the respective points of the similarity matrix are arranged in the order of the first image features of the to-be-checked video in the first video feature in the horizontal direction, and are in the longitudinal direction according to the respective second image features of the existing video.
  • the order of the second video features is arranged in order.
  • the point located in the jth column of the i-th row represents the single similarity between the i-th frame of the to-be-checked video and the j-th frame of the existing video
  • the similarity matrix is an M 1 ⁇ M 2 matrix.
  • the various monomer similarities have a consistent range of values, for example, the values of all types of monomer similarities can be determined in advance.
  • the range is set to 0 to 1.
  • the aforementioned example of the similarity of the monomers determined according to the cosine distance and the example of the similarity of the monomers determined according to the Hamming distance have all set the range of the similarity of the monomers to 0 to 1.
  • step S62 it is not necessary to first perform the calculation of the individual similarity of step S62, and then perform the determining similarity matrix of step S63, but directly determine the similarity matrix, and determine the similarity matrix.
  • the corresponding monomer similarity is calculated in the process of each point.
  • Step S64 Determine, according to the similarity matrix, a similar situation between the existing video and the to-be-checked heavy video.
  • the so-called determining similarity includes determining the degree of similarity between the existing video and the to-be-checked video according to the similarity matrix, and may use the sequence alignment score to express the degree of similarity.
  • the sequence alignment score may be a score between 0 and 1, the larger the number, the more similar the two videos are.
  • the foregoing determining the similarity situation further includes determining, according to the similarity matrix, a start and end time of the repeated video segment of the existing video and the to-be-checked heavy video.
  • step S64 includes determining a similar situation of the to-be-checked video and the existing video according to a straight line in the similarity matrix.
  • the similarity matrix is a finite matrix
  • the so-called "straight line” is a finite line segment composed of a plurality of points in the similarity matrix.
  • the line has a slope which is the slope of the line connecting the plurality of points included in the line.
  • the starting point and the ending point of the straight line may be any points in the similarity matrix, and are not necessarily points located at the edge.
  • the straight line in the present disclosure includes a diagonal line in the similarity matrix, and each line segment parallel to the diagonal line, and the straight line from the upper left to the lower right in the similarity matrix has a slope of 1, and the slope is not included.
  • 1 straight line may be a straight line with a slope of approximately 1 to improve the robustness of the check weight; it may be a straight line with a slope of 2, 3, ... or 1/2, 1/3, ..., etc.
  • the weight of the speed-adjusted video even a straight line with a negative slope (a line from the lower left to the upper right in the similarity matrix) to cope with the video processed by reverse playback.
  • the diagonal line is a line segment consisting of points at (1,1), (2,2), (3,3)... (actually a point starting from the point in the upper left corner and having a slope of 1) straight line).
  • each line in the similarity matrix is composed of a plurality of single similarities arranged in sequence, so that each line represents a similar situation of a plurality of sequentially arranged image feature pairs, thereby being able to express a certain period of time to be checked
  • Each of the image feature pairs includes a first image feature and a second image feature. That is, each line represents the degree of similarity between a plurality of sequentially arranged first image features and a plurality of sequentially arranged second image features.
  • the slope of the line and the end point of the line represent the length and position of the two video clips.
  • a straight line composed of (1, 1), (2, 3), (3, 5), (4, 7), because the first image feature with a ordinal number of 1 and the ordinal number are 1 for the second image feature
  • the similarity of the two videos can be determined according to the straight line in the similarity matrix: it is possible to define the average case (or the overall situation) of the individual similarities included in one straight line as the straight line similarity of the straight line, the straight line
  • the similarity can reflect the similarity between the corresponding plurality of first image features and the plurality of second image features; determining a straight line with the highest linear similarity in the similarity matrix, may be referred to as a matching straight line;
  • the similarity is determined as the degree of similarity between the to-be-checked video and the existing video, and/or the repetition of the to-be-checked video and the existing video is determined according to the plurality of first image features and the plurality of second image features corresponding to the matching straight line.
  • Video clip it is possible to define the average case (or the overall situation) of the individual similarities included in one straight line as the straight line similarity of the straight line, the straight line
  • the similarity can reflect the similarity between the corresponding plurality of first image features and the
  • the specific method for determining a repeated video segment according to a straight line in the similarity matrix may be: an ordinal number of the first image feature corresponding to the starting point of the straight line (or the abscissa in the similarity matrix) Determining a start time of the repeated video segment in the to-be-checked video, and determining the start of the repeated video segment in the existing video according to the ordinal number of the second image feature corresponding to the starting point (or the ordinate in the similarity matrix) Time; similarly, the end time of the repeated video clip in the to-be-checked video is determined according to the abscissa of the end point of the straight line, and the end time of the repeated video clip in the existing video is determined according to the ordinate of the end point.
  • a straight line with the highest linear similarity may be determined from a plurality of preset straight lines, for example, the preset multiple straight lines are all the slopes set to a preset slope.
  • a straight line with a fixed value such as a slope of 1
  • a straight line is fitted according to the points to generate A line that makes the straight line similarity the highest.
  • FIG. 7 is a schematic flow chart of performing video check by using a dynamic programming method according to an embodiment of the present disclosure. Referring to FIG. 7, in an embodiment, step S64 of the present disclosure includes the following specific steps:
  • Step S64-1a defining a plurality of straight lines whose slopes in the similarity matrix are preset preset slope values as candidate straight lines, and determining the candidate straight lines according to each individual similarity included in each candidate straight line Straight line similarity.
  • the straight line similarity of a straight line may be set as an average value of the individual similarities of the respective units included in the straight line, or may be set as the sum of the individual similarities of the respective units included in the straight line.
  • the slope setting value may be taken as 1, that is, the aforementioned alternative straight line is: a diagonal line in the similarity matrix and a straight line parallel to the diagonal line.
  • step S64-1a further includes: excluding, from the candidate line, those lines including the number of unit similarities less than a preset line length setting value. Then, it proceeds to step S64-1b.
  • the candidate straight line must also satisfy that the number of included monomer similarities reaches a preset straight line length setting value.
  • Step S64-1b from the plurality of candidate straight lines, determining an alternative straight line that maximizes the similarity of the straight line, and defines it as the first matching straight line. Thereafter, the processing proceeds to step S64-1c.
  • Step S64-1c determining a straight line similarity of the first matching straight line as a sequence comparison score for expressing the degree of similarity between the to-be-checked video and the existing video; determining two according to the start and end points of the first matching straight line The start and end time of a repeating segment in the video.
  • the preset slope setting value in step S64-1a may be multiple, that is, the candidate straight line is equal to any one of the plurality of slope setting values.
  • a straight line for example, an alternative straight line may be a straight line having a slope of 1, -1, 2, 1/2, etc., and in step S64-1b, a plurality of alternative straight lines having a slope from any one of a plurality of slope setting values Determine a first matching line.
  • the method for judging the repeated video proposed by the present disclosure can improve the accuracy of the check and the speed of checking by using the dynamic programming method to determine the sequence alignment score and/or determine the repeated video segments.
  • FIG. 8 is a schematic flow chart of performing video check using the uniform video method according to an embodiment of the present disclosure. Referring to FIG. 8, in an embodiment, step S64 of the present disclosure includes the following specific steps:
  • step S64-2a a plurality of points with the largest single similarity are selected in the similarity matrix, and are defined as similarity extreme points.
  • the specific number of similarity extreme points taken may be preset. Thereafter, the processing proceeds to step S64-2b.
  • Step S64-2b based on the plurality of similarity extreme points, fitting a straight line as the second matching straight line in the similarity matrix.
  • a straight line having a preset slope set value or a preset slope set value is fitted as a second matching line based on the plurality of similarity extreme points, for example, fitting a line A line with a slope close to 1.
  • a random sample consistency method Random Sample Consensus method, RANSAC method for short
  • the RANSAC method is a commonly used method for calculating the mathematical model parameters of a data according to a set of sample data sets containing abnormal data to obtain valid sample data. Thereafter, the processing proceeds to step S64-2c.
  • Step S64-2c determining a sequence alignment score according to the plurality of single cell similarities included in the second matching line, to represent the degree of similarity between the to-be-checked video and the existing video. Specifically, an average value of individual monomer similarities on the second matching straight line may be determined as the sequence alignment score. In addition, the start and end times of the repeated segments in the two videos may be determined according to the start and end points of the second matching straight line.
  • the method for judging the repeated video proposed by the present disclosure can improve the accuracy of the check and the speed of the check by determining the sequence alignment score and/or determining the repeated video segments by using the uniform video method.
  • step S64 further includes: detecting the first part and the ending part of the obtained first matching line or the second matching line, and determining Whether the point (monomer similarity) of the beginning portion and the end portion of the first matching line/second matching line reaches a preset unit similarity setting value, and the beginning of the first matching line/second matching line is removed
  • the portion of the ending that does not reach the monomer similarity setting value ie, the monomer similarity is not high
  • the accuracy of the check can be improved and the accuracy can be more accurate. Get the start and end time of the repeated video clip.
  • the specific method for removing the portion of the matching straight line at the beginning/end of the matching line that does not reach the unit similarity setting value may be: checking from the start/end point of the matching straight line to the middle to determine whether the single similarity setting value is reached. After finding the first point that reaches the monomer similarity setpoint, remove the point to a number of points between the start/end point.
  • the video is checked by taking the average or minimum value of the similarity of the plurality of video features, and the similarity can be reduced or excluded by using a single video feature (for example, the similarity matrix and the straight line described above). Similarity, etc.) There is a mismatch in the comparison, which improves the accuracy of the check.
  • each individual first image feature of the at least one first video feature may be used as an index request to perform a word frequency-inverse document frequency ranking on a plurality of existing videos. Referred to as TF-IDF ranking).
  • the second video feature may be indexed to obtain a feature index of the plurality of existing videos in advance, and then the feature index is matched with the first image feature to match the plurality of existing videos.
  • the foregoing obtaining the feature index of the existing video in advance further includes: obtaining a forward index and an inverted index of the video feature of the existing video in advance to facilitate checking the video.
  • the positive row index and the inverted feature index may be pre-stored in a video database.
  • the in-line feature index is used to record video features of each existing video, that is, which image features of the video features of the existing video are specifically included and the order of the image features; and the inverted feature index is used to record each image feature. Appears in which or which video features of an existing video.
  • the positive-order feature index and the inverted feature index may be stored in a form of a key-value pair: in the positive-row feature index, a video is used to indicate the number of a video (or, Called the video ID), and the value corresponding to the key records which image features of the video and the order of the image features. It may be referred to as the positive key and the positive value in the positive index.
  • Ranking value in the inverted feature index, a key is used to represent an image feature, and a value corresponding to the key records the number of the video containing the image feature, which may be in the inverted feature index.
  • the keys and values are called inverted keys and inverted values, respectively.
  • the TF-IDF ranking is a technique for judging the importance of information by weighting the frequency of words and the frequency of reverse files for ranking.
  • the word frequency refers to the frequency at which a word (or a message) appears in an article (or a file). The higher the word frequency, the more important the word is for the article; the frequency of the file refers to a The word appears in the number of articles in the article library, and the reverse file frequency is the reciprocal of the file frequency (in actual calculation, the logarithm of the reverse file frequency can also be taken, or the inverse file frequency is the logarithm of the reciprocal of the file frequency) ), the higher the frequency of the reverse file, the better the discrimination of the word.
  • the TF-IDF ranking ranks the product of the word frequency and the reverse file frequency.
  • the video feature of a video can be used as an article, and each image feature is used as a word, so that existing videos can be ranked by TF-IDF.
  • the absolute matching of the existing videos in the video database may be performed before the second ranking (exact match) ).
  • the absolute matching is used to select the existing video of the first image feature included in the preset number or the preset ratio to perform the second ranking.
  • FIG. 9 is a schematic flow chart of a second ranking including an absolute matching step according to an embodiment of the present disclosure. Referring to FIG. 9, in an embodiment of the present disclosure, before performing step S12, the following steps are performed:
  • Step S71 according to the inverted feature index, statistics on which first video features are present in the second video features of the existing video, to match the existing video containing the preset number of the first image features from the video database as the first Three candidate video collections. Thereafter, the process proceeds to step S72.
  • Step S72 determining a word frequency of a first image feature in a second video feature of a third candidate video based on the positive row feature index.
  • the word frequency is the ratio of a first image feature to all of the image features included in a second video feature.
  • Step S73 determining a file frequency of a first image feature based on the inverted feature index.
  • the file frequency is: among a plurality of existing videos (for example, all the existing videos in the video database), and the number of existing videos including the first image feature in the second video feature accounts for the existing video. The proportion of the total. Thereafter, the processing proceeds to step S74.
  • Step S74 Determine a word frequency-reverse file frequency score of the third candidate video according to a word frequency of each of the first image features in a second video feature of the third candidate video and a file frequency of each of the first image features. Thereafter, the processing proceeds to step S75.
  • Step S75 ranking the third candidate video set according to the obtained word frequency-reverse file frequency score of each third candidate video, obtaining a result of the second ranking, and extracting the first k third candidate videos from the second ranking result.
  • the second candidate video set At the same time, the second video feature (positive row feature index) of each second candidate video may also be returned, in order to further process the second candidate video set based on the second video feature in a subsequent step S12.
  • the index server may be used as an index request for the first image feature of the video to be checked, and the absolute matching and the TF-IDF ranking are performed according to the foregoing positive row index and the inverted feature index.
  • the second candidate video set is recalled and the obtained positive row feature index of each second candidate video is returned at the same time.
  • the above-described various steps can be performed using the open source Elasticsearch search engine to achieve the effect of fast retrieval.
  • step S12 and the step S13 in the foregoing example are: performing the foregoing sequence comparison on the plurality of second candidate videos to obtain a sequence comparison result, and performing the foregoing method on the plurality of second candidate videos. a ranking to select a first candidate video from the plurality of second candidate videos according to the first ranking.
  • the method for judging the repeated video proposed by the present disclosure can greatly improve the accuracy and efficiency of judging the repeated video by performing the second ranking.
  • step S71 the process of binarizing the image features in the foregoing embodiment may be performed to facilitate the second ranking.
  • FIG. 10 is a schematic structural block diagram of an embodiment of a determining apparatus for a repeated video according to the present disclosure.
  • the determining apparatus 100 of the repeated video of the example of the present disclosure mainly includes:
  • the video feature acquisition module 110 is configured to acquire multiple types of video features of the to-be-checked video.
  • the sequence comparison module 120 is configured to perform sequence alignment on the plurality of existing videos according to the plurality of video features of the to-be-checked video to obtain a sequence comparison result.
  • the first ranking module 130 is configured to perform a first ranking of the plurality of existing videos according to the sequence comparison result, and extract the first n existing videos in the first ranking result according to the result of the first ranking.
  • a candidate video where n is a positive integer.
  • the check module 140 is configured to determine a repetition condition of the to-be-checked video according to the sequence comparison result of the first candidate video.
  • FIG. 11 is a schematic block diagram of a video feature acquiring module 110 according to an embodiment of the present disclosure.
  • the video feature acquiring module 110 of the example of the present disclosure mainly includes:
  • the sampling unit 111 is configured to perform sampling and frame drawing on the video to be checked, to obtain a plurality of frame images of the to-be-checked video.
  • the first image feature extraction unit 112 is configured to extract a plurality of image features of each frame image, and may define an image feature of the video to be checked as the first image feature.
  • the first video feature determining unit 113 determines, for each first image feature of the same kind of the plurality of frame images of the to-be-viewed video, the first video feature of the to-be-checked video, thereby obtaining a plurality of first video features. .
  • the first image feature extraction unit 112 may include a plurality of subunits (not shown) for extracting the fence feature according to the steps in the foregoing method embodiments, and/or including the step extraction pool according to the foregoing method embodiment. Multiple sub-units of the feature (not shown).
  • the determining apparatus of the repeated video of the example of the present disclosure may further include a binarization module (not shown) for performing binarization processing on the image features by using a random projection method.
  • FIG. 12 is a schematic block diagram of a sequence comparison module 120 provided by an embodiment of the present disclosure.
  • the sequence comparison module 120 of the example of the present disclosure mainly includes:
  • the second video feature acquiring unit 121 is configured to acquire multiple video features of an existing video. It may be desirable to define video features of an existing video as second video features, each second video feature comprising a plurality of second image features.
  • the unit similarity determining unit 122 is configured to respectively determine a unit similarity between each of the second image features of each of the second video features and each of the first video features of the same kind, To obtain a variety of monomer similarities.
  • a similarity matrix determining unit 123 configured to determine a similarity matrix of the existing video according to an average value of the plurality of individual similarities; or the similarity matrix first determining unit 123 is configured to: The similarity matrix of the existing video is determined according to the minimum value of the plurality of monomer similarities.
  • the sequence comparison unit 124 is configured to determine, according to the similarity matrix, a similar situation between the existing video and the to-be-checked video. Specifically, the sequence comparison unit 124 is configured to determine a similar situation between the to-be-checked video and the existing video according to the straight line in the similarity matrix.
  • the sequence matching unit 124 may include determining, according to the foregoing method embodiment, the sequence alignment score by using a uniform video method and determining a plurality of subunits (not shown) of the video repeat segment, or the sequence ratio Pair unit 124 may include determining the sequence alignment score using dynamic programming and determining a plurality of sub-units (not shown) of the video repeat.
  • a second ranking module (not shown) may be provided for each of the plurality of existing images according to each of the at least one first video feature.
  • the video performs a second ranking to select a second candidate video from the video database.
  • the sequence comparison module 120 is configured to perform sequence alignment on the second candidate video.
  • the second ranking module is configured to: use each of the at least one first video feature as an index request, and perform a word frequency-reverse file frequency TF-IDF manner on the plurality of existing videos. Ranking.
  • binarization module may be set before the second ranking module, so that the second ranking module performs the second ranking.
  • FIG. 13 is a hardware block diagram illustrating a determination hardware device that repeats video according to an embodiment of the present disclosure.
  • the determination hardware device 300 of the repeated video according to an embodiment of the present disclosure includes a memory 301 and a processor 302.
  • the components of the repeating video determine that the components in hardware device 300 are interconnected by a bus system and/or other form of connection mechanism (not shown).
  • the memory 301 is for storing non-transitory computer readable instructions.
  • memory 301 can include one or more computer program products, which can include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache or the like.
  • the nonvolatile memory may include, for example, a read only memory (ROM), a hard disk, a flash memory, or the like.
  • the processor 302 can be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and can control other components in the repeating video's determination hardware device 300 to perform the desired functions.
  • the processor 302 is configured to execute the computer readable instructions stored in the memory 301 such that the determining hardware device 300 of the repeated video performs the determination of the repeated video of the foregoing embodiments of the present disclosure. All or part of the steps of the method.
  • FIG. 14 is a schematic diagram illustrating a computer readable storage medium in accordance with an embodiment of the present disclosure.
  • a computer readable storage medium 400 according to an embodiment of the present disclosure has stored thereon non-transitory computer readable instructions 401.
  • the non-transitory computer readable instructions 401 are executed by the processor, all or part of the steps of the method of determining the repeated video of the foregoing embodiments of the present disclosure are performed.
  • FIG. 15 is a schematic diagram showing a hardware structure of a terminal device according to an embodiment of the present disclosure.
  • the terminal device may be implemented in various forms, and the terminal device in the present disclosure may include, but is not limited to, such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP.
  • Mobile terminal devices portable multimedia players
  • navigation devices in-vehicle terminal devices, in-vehicle display terminals, in-vehicle electronic rearview mirrors, and the like, and fixed terminal devices such as digital TVs, desktop computers, and the like.
  • the terminal device 1100 may include a wireless communication unit 1110, an A/V (audio/video) input unit 1120, a user input unit 1130, a sensing unit 1140, an output unit 1150, a memory 1160, an interface unit 1170, and control.
  • Figure 15 illustrates a terminal device having various components, but it should be understood that not all illustrated components are required to be implemented. More or fewer components can be implemented instead.
  • the wireless communication unit 1110 allows radio communication between the terminal device 1100 and a wireless communication system or network.
  • the A/V input unit 1120 is for receiving an audio or video signal.
  • the user input unit 1130 can generate key input data according to a command input by the user to control various operations of the terminal device.
  • the sensing unit 1140 detects the current state of the terminal device 1100, the location of the terminal device 1100, the presence or absence of a user's touch input to the terminal device 1100, the orientation of the terminal device 1100, the acceleration or deceleration movement and direction of the terminal device 1100, and the like, and A command or signal for controlling the operation of the terminal device 1100 is generated.
  • the interface unit 1170 serves as an interface through which at least one external device can connect with the terminal device 1100.
  • Output unit 1150 is configured to provide an output signal in a visual, audio, and/or tactile manner.
  • the memory 1160 may store a software program or the like that performs processing and control operations performed by the controller 1180, or may temporarily store data that has been output or is to be output.
  • Memory 1160 can include at least one type of storage medium.
  • the terminal device 1100 can cooperate with a network storage device that performs a storage function of the memory 1160 through a network connection.
  • Controller 1180 typically controls the overall operation of the terminal device. Additionally, the controller 1180 can include a multimedia module for reproducing or playing back multimedia data.
  • the controller 1180 can perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
  • the power supply unit 1190 receives external power or internal power under the control of the controller 1180 and provides appropriate power required to operate the various components and components.
  • Various embodiments of the method of determining repeated video proposed by the present disclosure may be implemented in a computer readable medium using, for example, computer software, hardware, or any combination thereof.
  • various embodiments of the method for determining repeated video proposed by the present disclosure may be through the use of an application specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device ( PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, at least one of the electronic units designed to perform the functions described herein, in some cases
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • processor controller, microcontroller, microprocessor, at least one of the electronic units designed to perform the functions described herein, in some cases
  • controller 1180 Various embodiments of the method of judging the repeated video proposed by the disclosure may be implemented in the controller 1180.
  • various implementations of the method of determining repeated video proposed by the present disclosure can be implemented with separate software modules that allow for the execution of at least one function or operation.
  • the software code can be implemented by a software application (or program) written in any suitable programming language, which can be stored in memory 1160 and executed by controller 1180.
  • the method, device, hardware device, computer readable storage medium, and terminal device for repeating video can greatly improve the efficiency and accuracy of determining repeated video by performing video check using a plurality of video features. And robustness.
  • exemplary does not mean that the described examples are preferred or better than the other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种重复视频的判断方法及装置,该方法包括:获取待查重视频的多种视频特征;根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果;根据所述序列比对结果,对所述多个已有视频进行第一排名,根据所述第一排名的结果,取出前n个所述已有视频作为第一候选视频;根据所述第一候选视频的所述序列比对结果,确定所述待查重视频的重复情况。

Description

一种重复视频的判断方法及装置
相关申请的交叉引用
本申请要求申请号为201810273706.3、申请日为2018年3月29日的中国专利申请的优先权,该文献的全部内容以引用方式并入本文。
技术领域
本公开涉及视频处理技术领域,特别是涉及一种重复视频的判断方法及装置。
背景技术
在如今的多媒体信息社会中,用户每天向视频平台上传海量的视频,这些视频中大部分是正常的有价值视频,然而也有一些问题视频,问题视频主要包括:和平台视频数据库中的已有视频重复的视频、与版权数据库中的视频重复的视频(例如,需要支付版权费的视频)以及某些不适宜或禁止展示的视频。因此需要对用户上传的海量视频进行快速的比对和消重。
现有的对视频进行比对和消重的方法存在速度慢,准确性差,对运算资源和存储资源消耗大等问题。
发明内容
本公开的目的在于,提供一种新的重复视频的判断方法及装置。
本公开的目的是采用以下的技术方案来实现的。依据本公开提出的重复视频的判断方法,包括以下步骤:获取待查重视频的多种视频特征;根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果;根据所述序列比对结果,对所述多个已有视频进行第一排名,根据所述第一排名的结果,取出前n个所述已有视频作为第一候选视频,其中n为正整数;根据所述第一候选视频的所述序列比对结果,确定所述待查重视频的重复情况。
本公开的可以采用以下的技术措施来进一步实现。
前述的重复视频的判断方法,其中,所述获取所述待查重视频的多种视频特征包括:对所述待查重视频进行抽帧,得到所述待查重视频的多个帧图像;提取所述帧图像的多种图像特征作为第一图像特征;根据所述待查重视频的多个帧图像的同种的所述第一图像特征,确定所述待查重视频的视频特征作为第一视频特征,以得到多种所述第一视频特征。
前述的重复视频的判断方法,其中,所述的提取所述帧图像的多种图像特征包括:对于每个所述帧图像,获取一个或多个检测向量,利用每个 所述检测向量,以所述帧图像中的任意像素作为起点,确定所述起点的经所述检测向量指向的终点,根据各个所述起点与对应的所述终点之间的差异情况的总体情况来确定所述帧图像的图像特征,作为栅栏特征。
前述的重复视频的判断方法,其中,所述的提取所述帧图像的多种图像特征包括:对于每个所述帧图像,逐级地进行多种类型的池化,以得到所述帧图像的图像特征,作为池化特征;其中,所述多种类型的池化包括最大池化、最小池化和平均池化。
前述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果包括:获取一个所述已有视频的多种视频特征作为第二视频特征,每种所述第二视频特征包含多个第二图像特征;分别确定同种的每个所述第二图像特征和每个所述第一图像特征之间的单体相似度,以得到多种所述单体相似度;确定所述多种单体相似度的平均值或最小值,根据所述多种单体相似度的平均值或最小值确定所述已有视频的相似度矩阵;根据所述相似度矩阵确定序列比对评分,所述序列比对评分用于表示所述已有视频与所述待查重视频的相似程度。
前述的重复视频的判断方法,其中,所述的根据所述相似度矩阵确定序列比对评分包括:根据所述相似度矩阵中的直线确定序列比对评分。
前述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果还包括:根据所述相似度矩阵确定所述已有视频和所述待查重视频的重复视频片段。
前述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果包括:根据至少一种所述第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名,根据所述第二排名的结果,取出前k个所述已有视频作为第二候选视频,其中k为正整数,分别对每个所述第二候选视频进行序列比对,得到序列比对结果。
前述的重复视频的判断方法,其中,所述的根据至少一种所述第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名包括:将至少一种所述第一视频特征中的每个单独的第一图像特征作为索引请求,对多个已有视频进行词频-逆向文件频率排名。
前述的重复视频的判断方法,其中,所述根据所述待查重视频的多个帧图像的每种所述第一图像特征,确定所述待查重视频的所述多种视频特征作为第一视频特征包括:对所述第一图像特征进行二值化处理;根据所述多个帧图像的二值化的所述第一图像特征,确定所述第一视频特征。
本公开的目的还采用以下技术方案来实现。依据本公开提出的重复视频的判断装置,包括:视频特征获取模块,用于获取待查重视频的多种类型的视频特征;序列比对模块,用于根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果;第一排名模块,用于根据所述序列比对结果,对所述多个已有视频进行第一排名,根据所述第一排名的结果,取出前n个所述已有视频作为第一候选视频,其中n为正整数;查重模块,用于根据所述第一候选视频的所述序列比对结果,确定所述待查重视频的重复情况。
本公开的目的还可以采用以下的技术措施来进一步实现。
前述的重复视频的判断装置,其还包括执行前述任一重复视频的判断方法步骤的模块。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种重复视频的判断硬件装置,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现前述任意一种重复视频的判断方法。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行前述任意一种重复视频的判断方法。
本公开的目的还采用以下技术方案来实现。依据本公开提出的一种终端设备,包括前述任意一种重复视频的判断装置。
上述说明仅是本公开技术方案的概述,为了能更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施例,并配合附图,详细说明如下。
附图说明
图1是本公开一个实施例的重复视频的判断方法的流程框图。
图2是本公开一个实施例提供的获取待查重视频的视频特征的流程框图。
图3是本公开一个实施例提供的提取栅栏特征流程框图。
图4是本公开一个实施例提供的提取池化特征流程框图。
图5是本公开一个实施例提供的利用随机投影法对图像特征进行二值化处理的流程框图。
图6是本公开一个实施例提供的序列比对的流程框图。
图7是本公开一个实施例提供的利用动态规划法进行序列比对的流程 框图。
图8是本公开一个实施例提供的利用匀速视频法进行序列比对的流程框图。
图9是本公开一个实施例提供的第二排名的流程框图。
图10是本公开一个实施例的重复视频的判断装置的结构框图。
图11是本公开一个实施例提供的视频特征获取模块的结构框图。
图12是本公开一个实施例提供的序列比对模块的结构框图。
图13是本公开一个实施例的重复视频的判断硬件装置的硬件框图。
图14是本公开一个实施例的计算机可读存储介质的示意图。
图15是本公开一个实施例的终端设备的结构框图。
具体实施方式
为更进一步阐述本公开为达成预定发明目的所采取的技术手段及功效,以下结合附图及较佳实施例,对依据本公开提出的重复视频的判断方法及装置的具体实施方式、结构、特征及其功效,详细说明如后。
图1为本公开的重复视频的判断方法一个实施例的示意性流程框图。请参阅图1,本公开示例的重复视频的判断方法,主要包括以下步骤:
步骤S11,获取待查重视频(Query Video)的多种视频特征。这里所说的视频可以是一段视频信号,也可以是一个视频文件。不妨将待查重视频的视频特征定义为第一视频特征。此后,处理进到步骤S12。
步骤S12,根据待查重视频的多种该第一视频特征,对多个已有视频中的每个已有视频分别进行序列比对,得到每个已有视频的序列比对结果。在一些示例中,该序列比对结果包括用于表现该已有视频与待查重视频的相似程度的序列比对评分和/或该已有视频与待查重视频的相重复的视频片段。在一些实施例中,该已有视频为一个视频数据库中的视频。此后,处理进到步骤S13。
步骤S13,根据该序列比对结果,对该多个已有视频进行第一排名,根据该第一排名的结果,取出第一排名结果中的前n个已有视频作为第一候选视频,其中n为正整数。此后,处理进到步骤S14。
步骤S14,根据该第一候选视频的序列比对结果,确定该待查重视频的重复情况。例如确定该待查重视频是否为重复视频(可以通过人工比对来确定,也可以通过预设一个序列比对评分的阈值,并根据第一候选视频的序列比对评分是否高于该阈值来确定)、确定与哪个或哪些已有视频相重复、还可以确定具体的重复视频片段,进而过滤重复视频。
利用本公开提出的重复视频判断方法进行视频查重,通过利用多种视频特征进行查重,能够大大提高判断重复视频的准确性和效率。
下面对上述的步骤分别进行详细的陈述和说明。
一、关于步骤S11。
图2为本公开一个实施例提供的获取待查重视频的视频特征的示意性框图。请参阅图2,在本公开的一种实施例中,本公开示例中的步骤S11包括以下步骤:
步骤S21,对待查重视频进行采样抽帧,得到该待查重视频的多个帧图像。事实上该多个帧图像构成的一个图像序列。具体地,所抽取的帧图像的具体数量是可以设置的,例如可以从每秒视频中抽取两个帧图像,也可以从每秒视频中抽取一个帧图像。需注意,可以均匀地进行抽帧,即相邻的两个帧图像之间的时间间隔是一致的。。
步骤S22,提取每个帧图像的多种的图像特征,不妨将待查重视频的图像特征定义为第一图像特征。
步骤S23,根据待查重视频的该多个帧图像的同种的每个第一图像特征确定该待查重视频的第一视频特征,从而得到多种第一视频特征。具体地,可以将该多个第一图像特征按照所对应的多个帧图像在视频中的顺序(也就是在图像序列中的顺序)进行排列,得到该第一视频特征。
其中,对步骤S22中的提取图像特征的方法以及所得到的第一图像特征的类型不做限制,可以利用多种方式进行的第一图像特征的提取。例如提取得到的第一图像特征可以是浮点数特征也可以是二值化特征。需要说明的是,视频数据库中记录有已有视频的视频特征(不妨将已有视频的视频特征定义为第二视频特征,且该第二视频特征由多个第二图像特征构成),并且,视频数据库中包含与第一视频特征为利用相同方法提取到的相同类型的第二视频特征,以使得在视频特征比对过程中能够进行相同类型的第一视频特征与第二视频特征的比较。
值得注意的是,可以按照上述方法获取已有视频的视频特征。为了便于区分,不妨将已有视频的视频特征定义为第二视频特征,第二视频特征中的图像特征定义为第二图像特征。
在本公开的一些实施例中,在步骤S22中提取得到的多种图像特征之中包括栅栏特征(也可以称为Fence特征或Recall特征)。提取帧图像的栅栏特征的方法为:对于每个帧图像,获取一个或多个检测向量,利用每个检测向量,以一个帧图像中的任意像素作为起点确定该检测向量指向的终点,确定每对起点与终点之间的差异情况,根据各对起点终点的差异情况的总体情况来确定帧图像的图像特征,将这种图像特征定义为栅栏特征。需要注意的是,所谓的以任意像素作为起点为:一般可以将帧图像中的所有像素定义为起点;或者也可以将帧图像中的一个或多个预设位置的像素定义为起点,而具体的位置是任意的,例如,可以取一个帧图像中所有的 未处于边缘上的点作为起点。
具体地,图3为本公开一个实施例提供的提取栅栏特征的示意性框图。由于对任何视频均可以按照图3所示的方法获取图像特征,在本实施例的说明中不区分是否为待查重视频。请参阅图3,在本公开的一种实施例中,本公开示例中的步骤S22可以包括以下步骤:
步骤S31,获取一个或多个检测向量(shift vectors)。不妨假设所获取的检测向量的数量为N个,其中的N为正整数。具体地,该多个检测向量可以是预设的,也可以是随机生成的。进一步地,每个检测向量的长度和方向都是任意的。另外,各个检测向量之间是独立的,不需要有任何关联。值得注意的是,对于抽帧得到的多个帧图像,一般可以利用同一组检测向量来确定各个帧图像的图像特征,但是也可以利用不同的多组检测向量来分别确定各个帧图像的图像特征。此后,处理进到步骤S32。
步骤S32,根据一个检测向量,以帧图像中的每个像素作为起点,确定该起点的经该检测向量指向的终点的像素,根据各个起点像素与对应的终点像素之间的差异情况的总体情况,确定该帧图像的基于每个检测向量的特征比特(bit)。此后,处理进到步骤S33。
步骤S33,分别确定每个检测向量对应的特征比特,根据所得的N个该特征比特,确定与该帧图像对应的一个栅栏特征。
在一些示例中,起点像素与终点像素之间的差异情况包括起点像素的亮度信息与终点像素的亮度信息之间的差异情况。具体地,在一个示例中,步骤S32包括:为一个检测向量分配一个计数器;统计每对起点和终点的亮度差异,对计数器的取值进行增减,如果起点的亮度值大于终点的亮度值,则计数器的值+1,反之,如果起点的亮度值小于终点的亮度值则计数器的值-1;判断该计数器的值是否大于预设的设定值(例如可将该设定值预设为0),如果该计数器的值大于该设定值,则生成一个取值为1的特征比特,反之则生成一个取值为0的特征比特。
需要说明的是,如果检测向量的终点超出了帧图像的范围,可以不改变计数器的值,或者也可以对该帧图像进行周期性的延拓,在该帧图像的四面八方设置与该帧图像同样的帧图像,以使得检测向量的终点一定存在一个对应的像素。
根据长度、方向任意的检测向量确定帧图像中的起点像素和终点像素,并比较起点像素与终点像素之间的差异情况来生成帧图像的特征,能够提高视频特征提取的准确性和提取的效率,并能提高得到的视频特征的优良程度,进而使得基于栅栏特征进行的视频查重具有更高的准确率和效率。
在本公开的一些实施例中,在步骤S22中提取得到的多种图像特征之中包括池化特征(也可以称为Pooling特征或Reranking特征)。提取帧图 像的池化特征的方法为:对于每个帧图像,逐级地进行多种类型的池化(Pooling)处理,以得到该帧图像的图像特征,将这种图像特征定义为池化特征。其中,池化(Pooling)是一种在卷积神经网络领域的降维方法,而所谓的多种类型的池化包括最大池化、最小池化和平均池化。具体地,可以基于帧图像的多种颜色通道逐级地进行多种类型的池化,以根据帧图像的多种颜色通道得到图像特征。
具体地,对帧图像逐级地进行多种类型的池化的包括:根据帧图像确定一个矩阵,利用多种类型的池化,逐级地生成更小的矩阵,直到缩小为一个仅包括一个点的矩阵(或者,也可以将矩阵中的“点”称为矩阵中的“元素”),根据该仅包含一个点的矩阵确定该帧图像的池化特征。图4为本公开一个实施例提供的提取池化特征的示意性框图。由于对任何视频均可以按照图4所示的方法获取图像特征,在本实施例的说明中不区分是否为待查重视频。请参阅图4,在本公开的一种实施例中,本公开示例中的步骤S22可以包括以下步骤:
步骤S41,根据一个帧图像,确定一个具有第一矩阵维度和第二矩阵维度(或者说,具有长度方向和宽度方向)的第一矩阵。不妨假设该帧图像的长度为x像素、宽度为y像素,其中的x和y为正整数。该第一矩阵中的一个点(也可以将矩阵中的点称为矩阵中的元素,但为了与向量中的元素相区分,以下均将矩阵中的元素称为“点”)对应该帧图像中的一个像素,从而该第一矩阵为一个第一矩阵维度的长度为x、第二矩阵维度的长度为y的矩阵(即x*y矩阵);这里所说的矩阵的第一矩阵维度/第二矩阵维度的长度用于表示该矩阵在第一矩阵维度/第二矩阵维度上所包含的点的数量。该第一矩阵中的每个点的取值为一个3维的向量,将该3维的向量定义为第一向量,该第一向量用于表示该帧图像中的对应像素的三个颜色通道的亮度。需要注意的是,当视频对象的颜色模式为红绿蓝模式(RGB模式)时,可以取红、绿、蓝三个颜色通道;但并非一定取红、绿、蓝三个颜色通道,例如,可以根据视频对象所使用的颜色模式进行选取;甚至所选取的颜色通道的数量也并非必须是三个,例如,可以选取红绿蓝三个颜色通道中的两个。此后,处理进到步骤S42。
步骤S42,在第一矩阵上设置多个第一区块(事实上每个区块相当于一个池化窗,因此也可将第一区块称为第一池化窗),不妨设置x 1*y 1个第一区块,其中的x 1和y 1为正整数,每个第一区块包含多个该第一矩阵的点(或者说,包含多个第一向量);该多个第一区块在第一矩阵维度上的数量少于该第一矩阵的第一矩阵维度的长度(或者说,少于该第一矩阵在第一矩阵维度上所包含的点的数量),且该多个第一区块在第二矩阵维度上的数量少于该第一矩阵的第二矩阵维度的长度(或者说,少于该第一矩阵在第二矩 阵维度上所包含的点的数量),即有x 1的值小于x,且y 1的值小于y。对于每个第一区块,分别计算第一区块所包含的多个第一向量的各个维的最大值、最小值和平均值,得到该第一区块对应的一个9维的向量,将该9维的向量定义为第二向量。需要说明的是,各个第一区块之间可以部分相互重叠,即可以包含相同的点,也可以不相互重叠。此后,处理进到步骤S43。
具体地,在设置第一区块时,可以均匀地将第一矩阵的第一矩阵维分成x 1段,每段具有相同的长度,且相邻两段之间包含相同的点(部分重叠),按照同样的方式,将第一矩阵的第二矩阵维分成y 1段,再将该x 1段与该y 1段进行组合,得到第一矩阵的x 1*y 1个第一区块。
需要说明的是,当设置的每个第一区块具有相同的大小和相同的间距时(相邻的两个第一区块可以重叠),前述的在第一矩阵上设置多个第一区块并计算各个第一区块的第二向量的过程,事实上等同于用一个池化窗按照一定间距扫描(或者说划过)整个第一矩阵,并在每次扫描中,计算该池化窗覆盖的区域的第二向量。
步骤S43,根据该多个x 1*y 1个第一区块以及每个第一区块对应的第二向量,确定第二矩阵;该第二矩阵中的一个点对应一个第一区块,当设置了x 1*y 1个第一区块时,该第二矩阵就是一个第一矩阵维的长度为x 1、第二矩阵维的长度为y 1的矩阵(即x 1*y 1矩阵);该第二矩阵中的各个点的取值为对应的第一区块的该第二向量。此后,处理进到步骤S44。
需要说明的是,在确定第二矩阵时,需要按照一定顺序进行第一区块与第二矩阵中的点的对应。作为一种具体示例,可以按照各个第一区块在第一矩阵中的位置顺序,对第二矩阵中的各个点进行排列。
步骤S44,重复步骤S42和步骤S43:根据包含x 1*y 1个点且每个点的取值为9维向量的第二矩阵,得到包含x 2*y 2个点且每个点的取值为27维向量的第三矩阵(其中的x 2为小于x 1的正整数,y 2为小于y 1的正整数);再根据包含x 2*y 2个点且每个点的取值为27维向量的第三矩阵,得到包含x 3*y 3个点且每个点的取值为81维向量的第三矩阵(其中的x 3为小于x 2的正整数,y 3为小于y 2的正整数);...;直到将该第一矩阵(或者说,该帧图像)缩小成一个1*1的第N矩阵(事实上,就是将矩阵降维成了一个点),其中的N为正整数,该第N矩阵仅包括一个点,该点的取值为一个3 N维的向量;将该3 N维向量确定为该帧图像的池化特征。
需要注意的是,在步骤S44中,在各次的设置区块的过程中,应根据矩阵的大小采用相应的方式来设置区块,以适应矩阵的第一矩阵维、第二矩阵维的逐级减小。
通过逐级地对帧图像进行多种类型的池化以生成帧图像的特征,能够提高视频特征提取的准确性和提取的效率,并能提高得到的视频特征的优 良程度和鲁棒性,进而使得基于池化特征进行的视频查重具有更高的准确率和效率。
进一步地,在本公开的实施例中,如果在步骤S22中所确定的图像特征不是二进制数构成的比特串(例如前述的池化特征就是一种浮点数特征),则本公开还可以包括以下步骤:对在步骤S22中所确定的图像特征进行二值化处理,得到二值化的图像特征,该二值化的图像特征为由0/1构成的比特串。然后,再根据所得到的二值化的图像特征,确定视频对象的视频特征。
将图像特征进行二值化处理,能够压缩视频特征的存储,并加速视频比对的相似度计算过程。
具体地,可以利用随机投影(random projection)法将图像特征转化为二值化形式的图像特征。由于待查重视频的第一图像特征与已有视频的第二图像特征可以利用同样的方法进行二值化,在本示例中,不区分第一图像特征与第二图像特征。图5为本公开一个实施例提供的利用随机投影法对图像特征进行二值化的示意性框图。请参阅图5,本公开示例的重复视频的判断方法,还可以包括以下的利用随机投影法对图像特征进行二值化处理的步骤:
步骤S51,为了生成长度为h的二值化图像特征,根据一个图像特征,生成2h个小组(group),每个小组包含该图像特征中的多个元素(也就是,每个小组包含图像特征的多个维度的数值)。其中的h为正整数。此后,处理进到步骤S52。
需要说明的是,一个小组具体包含哪些元素是任意的,并且两个不同小组可以包括一些相同的元素。不过,为了便于视频比对,每个小组包含具体包含哪些元素可以是预设的,或者对多个视频可以采用同样的方式来生成该小组。
在本示例中,每个小组所包含的元素的数量是相同的。但需要说明的是,事实上各个小组所包含的元素的数量也可以是不同的。
步骤S52,分别对每个小组所包括的多个元素进行求和,以得到每个小组的加和值。此后,处理进到步骤S53。
步骤S53,将该2h个小组两两配对,得到h个小组对。此后,处理进到步骤S54。
具体地,可以预先将2h个小组编号(或者将小组排序),并将相邻的两个小组配成一对。
步骤S54,分别对每个小组对进行比较,比较每一个小组对中的两个小组的加和值的大小,根据比较的结果生成一个二值化的图像特征比特。此后,处理进到步骤S55。
具体地,在预先已将小组编号的示例中,在一对小组中,如果编号小的小组的加和值大于编号大的小组的加和值,则生成一个取值为1的二值化图像特征比特,反之则生成一个取值为0的二值化图像特征比特。需要说明的是,不限制生成二值化图像特征比特的方式,比如,也可以当编号小的小组的加和值小于编号大的小组的加和值时生成取值为1的二值化图像特征比特。
步骤S55,根据该h个小组对的该h个二值化图像特征比特,组成长度为h的二值化图像特征。
二、关于步骤S12和步骤S13。
图6为本公开一个实施例提供的序列比对的示意性流程框图。请参阅图6,本公开示例中的步骤S12,可以包括以下步骤:
步骤S61,获取一个已有视频的多种视频特征。不妨将已有视频的视频特征定义为第二视频特征,每种第二视频特征包含多个第二图像特征。此后,处理进到步骤S62。
例如,可以同时获取待查重视频和已有视频的前述的栅栏特征和池化特征,和/或同时获取前述的浮点数特征和二值化特征。
步骤S62,针对多种第二视频特征和多种第一视频特征,分别确定每种第二视频特征中的每个第二图像特征与同种的第一视频特征中的每个第一图像特征之间的单体相似度,以得到多种单体相似度。每个单体相似度用于表示一个第一图像特征与一个第二图像特征之间的相似程度,具体可以是,单体相似度越大表示越相似。此后,处理进到步骤S63。
不妨假设待查重视频的第一视频特征的长度、已有视频的第二视频特征的长度分别为M 1和M 2,其中的M 1和M 2为正整数,也就是说,第一视频特征包含M 1个第一图像特征,第二视频特征包含M 2个第二图像特征。从而同种的第一视频特征和第二视频特征之间可以得到M 1*M 2个单体相似度。
在本公开的实施例中,可以根据图像特征的类型,选择能够判断第一、第二图像特征的相似程度的距离或度量作为该单体相似度。
具体地,当第一、第二图像特征同为浮点数特征时,可根据第一图像特征与第二图像之间的余弦距离(或者,称为余弦相似度)确定该单体相似度;一般可直接将该余弦距离确定为单体相似度。而当第一、第二图像特征同为二值化特征时,可根据第一图像特征与第二图像特征之间的汉明距离(Hamming距离)确定该单体相似度。具体地,先计算第一、第二图像特征之间的汉明距离,再计算图像特征的长度与该汉明距离的差值,并将该差值与该图像特征长度的比值确定为单体相似度,用以表示二值化的第一、第二图像特征中的相同比特所占的比例。其中的汉明距离是一种信息 论领域中常用的度量,两个等长字符串之间的汉明距离是两个字符串对应位置的不同字符的个数。需要说明的是,利用同种方法提取得到的图像特征一般具有相同的长度。
值得注意的是,不限于利用余弦距离或汉明距离表示该单体相似度,而是可以利用任何可以判断两个图像特征的相似程度的距离或度量。
步骤S63,在一个示例中,确定多种单体相似度的最小值,根据该多种单体相似度的平均值确定该已有视频的相似度矩阵(Similarity Matrix);或者在另一个示例中,确定多种单体相似度的最小值,根据该多种单体相似度的最小值确定该已有视频的相似度矩阵。
具体地,该相似度矩阵中的每个点对应一个单体相似度,使得该相似度矩阵记录有一个已有视频的各个第二图像特征与各个第一图像特征之间的单体相似度。并且,该相似度矩阵的各个点:在横向上按照待查重视频的各个第一图像特征在第一视频特征中的先后顺序排列,且在纵向上按照已有视频的各个第二图像特征在第二视频特征中的先后顺序排列。从而位于第i行第j列的点表示待查重视频第i帧和已有视频第j帧之间的单体相似度,进而该相似度矩阵为一个M 1×M 2矩阵。此后,处理进到步骤S64。
需要说明的是,在取多种单体相似度的平均值或最小值之前,需要确保各种单体相似度具有一致的取值范围,例如可以预先将所有类型的单体相似度的取值范围均设置为0到1。事实上,前述的根据余弦距离确定的单体相似度的示例以及根据汉明距离确定的单体相似度的示例,均已将单体相似度的取值范围设置为0到1。
需要说明的是,在实际操作中,并非一定先进行步骤S62的计算各个单体相似度,再进行步骤S63的确定相似度矩阵,而是可以直接确定相似度矩阵,在确定该相似度矩阵的各个点的过程中计算对应的单体相似度。
步骤S64,根据该相似度矩阵,确定该已有视频与待查重视频的相似情况。具体地,所谓的确定相似情况包括,根据该相似度矩阵来确定该已有视频与待查重视频之间的相似程度,并可以利用序列比对评分来表现该相似程度。在本公开的实施例中,该序列比对评分可以是一个0到1之间的分数,数字越大表示两段视频越相似。进一步地,前述的确定相似情况还包括根据相似度矩阵来确定该已有视频和该待查重视频中的相重复的视频片段的起止时间。
在本公开的一些实施例中,步骤S64包括:根据相似度矩阵中的直线来确定待查重视频与已有视频的相似情况。
需注意,由于视频特征一般包含有穷的多个图像特征,从而相似度矩阵为有穷矩阵,因此实际上所谓的“直线”是相似度矩阵中的多个点组成的有穷长的线段。该直线具有斜率,该斜率为直线所包括的多个点的连线 的斜率。另外,该直线的起点和终点可以是相似度矩阵中的任意的点,不必是位于边缘的点。
本公开所说的直线包括相似度矩阵中的对角线、与该对角线相平行的各条线段这些在相似度矩阵中从左上到右下的斜率为1的直线,还包括斜率不为1的直线。例如,可以是的斜率近似于1的直线,以提高查重的鲁棒性;可以是斜率为2、3、...或1/2、1/3、...等的直线,以应对经过调速的视频的查重;甚至可以是斜率为负数的直线(在相似度矩阵中从左下到右上的直线),以应对经过反向播放处理的视频。其中的对角线为由位于(1,1)、(2,2)、(3,3)...的点组成的线段(事实上就是以左上角的点为起点且斜率为1的一条直线)。
事实上,相似度矩阵中的每条直线均由顺序排列的多个单体相似度构成,因此由于每条直线表现了多个顺序排列的图像特征对的相似情况,从而能够表现一段待查重视频片段与一段已有视频片段的相似程度。其中每个图像特征对包括一个第一图像特征和一个第二图像特征。也就是说,每条直线表现了多个顺序排列的第一图像特征与多个顺序排列的第二图像特征之间的相似程度。而直线的斜率、起点终点表现了两段视频片段的长度、位置。例如,由(1,1)、(2,3)、(3,5)、(4,7)构成的直线,由于表现了序数为1的第一图像特征与序数为1第二图像特征之间的相似情况、序数为2的第一图像特征与序数为3第二图像特征之间的相似情况、...,从而该直线能够反应序数为1、2、3、4的第一图像特征所对应的一段待查重视频的片段与序数为1、3、5、7的第二图像特征所对应的一段已有视频的片段之间的相似情况。
因此,可以根据相似度矩阵中的直线来确定两个视频的相似情况:不妨将一个直线所包含的各个单体相似度的平均情况(或总体情况)定义为该直线的直线相似度,该直线相似度能够体现对应的多个第一图像特征与多个第二图像特征之间的相似情况;在相似度矩阵中确定一条直线相似度最高的直线,不妨称为匹配直线;将匹配直线的直线相似度确定为待查重视频与已有视频的相似程度,和/或根据匹配直线所对应的多个第一图像特征和多个第二图像特征来确定待查重视频与已有视频的重复视频片段。
其中的根据相似度矩阵中的直线(例如匹配直线)来确定重复视频片段的具体方法可以是:根据直线的起点所对应的第一图像特征的序数(或者说,相似度矩阵中的横坐标)确定待查重视频中的重复视频片段的开始时间,而根据该起点所对应的第二图像特征的序数(或者说,相似度矩阵中的纵坐标)确定已有视频中的重复视频片段的开始时间;类似地,根据直线的终点的横坐标确定待查重视频中的重复视频片段的结束时间,而根据该终点的纵坐标确定已有视频中的重复视频片段的结束时间。
需要注意的是,在确定匹配直线的过程中,可以是从预设的多条直线中确定一条直线相似度最高的直线,例如该预设的多条直线为所有的斜率为预设的斜率设定值(比如斜率为1)的直线,或者,也可以是先从相似度矩阵中选取使得单体相似度的大小排名靠前的多个点,再根据这些点拟合出一条直线,以生成一条使得直线相似度相对最高的直线。
在本公开的一个具体实施例中,可以利用动态规划法来根据相似度矩阵确定两个视频的相似情况。图7为本公开一个实施例提供的利用动态规划法进行视频查重的示意性流程框图。请参阅图7,在一种实施例中,本公开的步骤S64包括以下具体步骤:
步骤S64-1a,将相似度矩阵中的斜率为预设的斜率设定值的多条直线定义为备选直线,根据每条备选直线所包含的各个单体相似度确定该备选直线的直线相似度。具体地,一条直线的直线相似度可以设置为该直线所包含的各个单体相似度的平均值,或者可以设置为该直线所包含的各个单体相似度的总和值。在一种具体示例中,可以将斜率设定值取为1,即前述的备选直线为:相似度矩阵中的对角线以及与该对角线平行的直线。此后,处理进到步骤S64-1b。
需要注意的是,在本公开的一种实施例中,步骤S64-1a还包括:先从备选直线中排除那些包含的单体相似度的数量少于预设的直线长度设定值的直线,然后再进到步骤S64-1b。或者说,在本实施例中,备选直线还须满足:包含的单体相似度的数量达到预设的直线长度设定值。通过排除单体相似度过少的直线,可以排除当直线包含的单体相似度过少而影响最终得到的序列比对结果的准确性的问题。
步骤S64-1b,从该多条备选直线中,确定一条使得该直线相似度最大的备选直线,并定义为第一匹配直线。此后,处理进到步骤S64-1c。
步骤S64-1c,将该第一匹配直线的直线相似度确定为序列比对评分,用以表现待查重视频与已有视频的相似程度;根据该第一匹配直线的起点和终点确定两个视频中的重复片段的起止时间。
需要注意的是,在本公开的一些实施例中,步骤S64-1a中的预设的斜率设定值可以为多个,即备选直线为斜率与多个斜率设定值中任意一个相等的直线,例如备选直线可以为斜率为1、-1、2、1/2等的直线,并且在步骤S64-1b中,从斜率为多个斜率设定值中任意一个的多条备选直线中确定一条第一匹配直线。
本公开提出的重复视频的判断方法,通过利用动态规划法来确定序列比对评分和/或确定相重复的视频片段,能够提高查重的准确性和查重的速度。
在本公开的另一个具体实施例中,也可以利用匀速视频法来根据相似 度矩阵确定两个视频的相似情况。图8为本公开一个实施例提供的利用匀速视频法进行视频查重的示意性流程框图。请参阅图8,在一种实施例中,本公开的步骤S64包括以下具体步骤:
步骤S64-2a,在相似度矩阵中选取单体相似度最大的多个点,并定义为相似度极值点。所取的相似度极值点的具体数量可以是预设的。此后,处理进到步骤S64-2b。
步骤S64-2b,基于该多个相似度极值点,在该相似度矩阵中拟合出一条直线作为第二匹配直线。在一些具体示例中,基于该多个相似度极值点拟合出一条具有预设的斜率设定值或接近预设的斜率设定值的直线作为第二匹配直线,例如,拟合出一条斜率接近1的直线。具体地,可以利用随机抽样一致法(Random Sample Consensus法,简称为RANSAC法)在该相似度矩阵中拟合出一条斜率接近斜率设定值的直线。其中的RANSAC法是一种常用的根据一组包含异常数据的样本数据集,计算出数据的数学模型参数,以得到有效样本数据的方法。此后,处理进到步骤S64-2c。
步骤S64-2c,根据该第二匹配直线所包含的多个单体相似度来确定序列比对评分,用以表现待查重视频与已有视频的相似程度。具体地,可以将该第二匹配直线上的各个单体相似度的平均值确定为该序列比对评分。另外,可以根据该第二匹配直线的起点和终点确定两个视频中的重复片段的起止时间。
本公开提出的重复视频的判断方法,通过利用匀速视频法来确定序列比对评分和/或确定相重复的视频片段,能够提高查重的准确性和查重的速度。
在本公开的一些实施例中(例如前述的图7和图8所示的实施例),步骤S64还包括:检测所得到的第一匹配直线或第二匹配直线的开头部分和结尾部分,判断该第一匹配直线/第二匹配直线的开头部分和结尾部分的点(单体相似度)是否达到预设的单体相似度设定值,去掉第一匹配直线/第二匹配直线的开头和结尾的未达到该单体相似度设定值(即单体相似度不高)的部分,保留中间一段直线并定义为第三匹配直线;根据该第三匹配直线的直线相似度来确定待查重视频与已有视频的相似程度,和/或根据该第三匹配直线的起点和终点确定待查重视频与已有视频的重复视频片段的起止时间。通过去掉匹配直线开头结尾的相似度不高的部分、保留中间一段相似度较高的直线之后,再确定待查重视频与已有视频的相似情况,能够提高查重的准确性,能够更准确地得到重复视频片段的起止时间。
其中的去掉匹配直线开头/结尾的未达到该单体相似度设定值的部分的具体方法可以是:从匹配直线的起点/终点向中间依次检查,判断是否达到该单体相似度设定值,在找到第一个达到该单体相似度设定值的点后, 去掉该点到起点/终点之间的多个点。
本公开的重复视频的判断方法,通过取多种视频特征的相似度的平均值或最小值进行视频查重,能够减少或排除利用单种视频特征得到相似度(例如前述的相似度矩阵、直线相似度等)进行比对存在误匹配的情况,进而提高查重的准确性。
进一步地,如果对视频数据库中的所有视频都进行序列比对和第一排名,可能会影响查重效率。因此在序列比对之前,可以先根据至少一种第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名,以从视频数据库中选出第二候选视频,进而再对第二候选视频进行序列比对。具体地,可以将至少一种第一视频特征中的每个单独的第一图像特征作为索引请求,对多个已有视频进行词频-逆向文件频率方式的排名(term frequency–inverse document frequency ranking,简称为TF-IDF排名)。
在本公开的一些实施例中,可以对第二视频特征进行索引,以预先得到多个已有视频的特征索引,然后将该特征索引与第一图像特征进行匹配,以对多个已有视频进行TF-IDF排名。
具体地,前述的预先得到已有视频的特征索引进一步包括,预先得到已有视频的视频特征的正排特征索引(forward index)和倒排特征索引(inverted index),以便于对视频查重。该正排特征索引和倒排特征索引可以预先存储在视频数据库中。其中,正排特征索引用于记录各个已有视频的视频特征,即记录了各个已有视频的视频特征具体包含了哪些图像特征以及这些图像特征的顺序;倒排特征索引用于记录各个图像特征在哪个或哪些已有视频的视频特征中出现。具体地,可以利用键值对(key-value对)的形式来存储该正排特征索引和倒排特征索引:在正排特征索引中,用一个键(key)表示一个视频的编号(或者,称为视频ID),而与该键对应的值(value)记录该视频包含了哪些图像特征以及这些图像特征的顺序,不妨将正排特征索引中的键、值分别称为正排键、正排值;在倒排特征索引中,用一个键(key)表示一个图像特征,而与该键对应的值(value)记录包含有该图像特征的视频的编号,不妨将倒排特征索引中的键、值分别称为倒排键、倒排值。
其中的TF-IDF排名是一类通过对信息进行词频和逆向文件频率加权,来判断信息的重要程度,以进行排名的技术。其中的词频是指一个词(或者说,一个信息)在某个文章(或者说,某个文件)中出现的频率,词频越高说明该词对于该文章越重要;其中的文件频率是指一个词出现在了文章库中的多少个文章中,而逆向文件频率是文件频率的倒数(实际计算时,还可对逆向文件频率取对数,或者定义逆向文件频率是文件频率的倒数的对数),逆向文件频率越高,说明该词的区分度越好。因此,TF-IDF排名利 用词频与逆向文件频率的乘积的大小进行排名。事实上,可以将一个视频的视频特征作为一个文章,而每个图像特征作为一个词,从而能够利用TF-IDF方式对已有视频进行排名。
另外,如果对视频数据库中的所有已有视频都进行第二排名,可能会影响第二排名的效率,因此在第二排名之前,可以先对视频数据库中的已有视频进行绝对匹配(exact match)。其中的绝对匹配,用于选出所包含的第一图像特征的数量在预设数量或预设比例以上的已有视频进行第二排名。
图9为本公开一个实施例提供的包含绝对匹配步骤的第二排名的示意性流程框图。请参阅图9,在本公开的一个实施例中,在进行步骤S12之前,先进行以下步骤:
步骤S71,根据倒排特征索引,统计各个第一图像特征在哪些已有视频的第二视频特征中出现,以从视频数据库中匹配出包含预设数量以上第一图像特征的已有视频作为第三候选视频集合。此后,处理进到步骤S72。
步骤S72,基于正排特征索引,确定一个第一图像特征在一个第三候选视频的第二视频特征中的词频。该词频为:一个第一图像特征在一个第二视频特征所包含的全部图像特征之中所占的比例。此后,处理进到步骤S73。
步骤S73,基于倒排特征索引,确定一个第一图像特征的文件频率。该文件频率为:在多个已有视频之中(例如,可以是视频数据库中所有的已有视频),第二视频特征中包含有该第一图像特征的已有视频的数量占已有视频总数的比例。此后,处理进到步骤S74。
步骤S74,根据各个第一图像特征在一个第三候选视频的第二视频特征中的词频以及各个第一图像特征的文件频率,确定该第三候选视频的词频-逆向文件频率评分。此后,处理进到步骤S75。
步骤S75,根据得到的各个第三候选视频的词频-逆向文件频率评分对第三候选视频集合进行排名,得到第二排名的结果,从该第二排名结果中取出前k个第三候选视频作为第二候选视频集合。同时,还可以返回各个第二候选视频的第二视频特征(正排特征索引),以备在后续的步骤S12中基于该第二视频特征对第二候选视频集合进行进一步处理。
在本实施例中,可以利用索引服务器,将待查重视频的第一图像特征的集合作为索引请求,根据前述的正排特征索引和倒排特征索引,进行绝对匹配和TF-IDF排名,以召回第二候选视频集合并同时返回得到的各个第二候选视频的正排特征索引。具体地,可以利用开源的Elasticsearch搜索引擎进行上述的各个步骤,以达到快速检索的效果。
值得注意的是,绝对匹配和第二排名着重关注各个单独的第一图像特征出现在哪些已有视频中,并未考虑各个第一图像特征在第一视频特征中 的顺序对查重的影响,或者说并未考虑视频特征的整体或顺序排列的多个图像特征的匹配情况。
对应地,前述示例中的步骤S12、步骤S13变为:分别对该多个第二候选视频进行前述的序列比对,得到序列比对结果,并对该多个第二候选视频进行前述的第一排名,以根据该第一排名从多个第二候选视频中选出第一候选视频。
本公开提出的重复视频的判断方法,通过进行第二排名,能够大大提高判断重复视频的准确性和效率。
需要说明的是,在进行步骤S71之前,可以进行前述实施例中的将图像特征二值化的处理,以便于进行第二排名。
图10为本公开的重复视频的判断装置一个实施例的示意性结构框图。请参阅图10,本公开示例的重复视频的判断装置100主要包括:
视频特征获取模块110,用于获取待查重视频的多种类型的视频特征。
序列比对模块120,用于根据待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果。
第一排名模块130,用于根据该序列比对结果,对该多个已有视频进行第一排名,根据该第一排名的结果,取出第一排名结果中的前n个已有视频作为第一候选视频,其中n为正整数。
查重模块140,用于根据该第一候选视频的序列比对结果,确定该待查重视频的重复情况。
图11为本公开的一个实施例提供的视频特征获取模块110的示意性框图。请参阅图11,本公开示例的视频特征获取模块110主要包括:
采样单元111,用于对待查重视频进行采样抽帧,得到该待查重视频的多个帧图像。
第一图像特征提取单元112,用于提取每个帧图像的多种的图像特征,不妨将待查重视频的图像特征定义为第一图像特征。
第一视频特征确定单元113,用于待查重视频的该多个帧图像的同种的每个第一图像特征确定该待查重视频的第一视频特征,从而得到多种第一视频特征。
具体地,该第一图像特征提取单元112可以包括按照前述方法实施例中的步骤提取栅栏特征的多个子单元(图中未示出),和/或包括按照前述方法实施例中的步骤提取池化特征的多个子单元(图中未示出)。
进一步地,本公开示例的重复视频的判断装置,还可以包括二值化模块(图中未示出),该二值化模块用于利用随机投影法对图像特征进行二值化处理。
图12为本公开的一个实施例提供的序列比对模块120的示意性框图。 请参阅图12,本公开示例的序列比对模块120主要包括:
第二视频特征获取单元121,用于获取一个已有视频的多种视频特征。不妨将已有视频的视频特征定义为第二视频特征,每种第二视频特征包含多个第二图像特征。
单体相似度确定单元122,用于分别确定每种第二视频特征中的每个第二图像特征与同种的第一视频特征中的每个第一图像特征之间的单体相似度,以得到多种单体相似度。
相似度矩阵确定单元123,该相似度矩阵第一确定单元123用于根据多种单体相似度的平均值确定该已有视频的相似度矩阵;或者该相似度矩阵第一确定单元123用于根据多种单体相似度的最小值确定该已有视频的相似度矩阵,。
序列比对单元124,用于根据该相似度矩阵,确定该已有视频与待查重视频的相似情况。具体地,该序列比对单元124用于根据相似度矩阵中的直线来确定待查重视频与已有视频的相似情况。
具体地,该序列比对单元124可以包括按照前述方法实施例中的利用匀速视频法确定该序列比对评分并确定视频重复片段的多个子单元(图中未示出),或者,该序列比对单元124可以包括利用动态规划法确定该序列比对评分并确定视频重复片段的多个子单元(图中未示出)。
进一步地,如果对视频数据库中的所有视频都进行序列比对和第一排名,可能会影响查重效率。因此在序列比对模块120之前,可以设有第二排名模块(图中未示出),用于根据至少一种第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名,以从视频数据库中选出第二候选视频。进而,序列比对模块120用于对该第二候选视频进行序列比对。具体地,该第二排名模块用于:将至少一种第一视频特征中的每个单独的第一图像特征作为索引请求,对多个已有视频进行词频-逆向文件频率TF-IDF方式的排名。
需要说明的是,可以在该第二排名模块之前设置前述的二值化模块,以便于该第二排名模块进行第二排名。
图13是图示根据本公开的实施例的重复视频的判断硬件装置的硬件框图。如图13所示,根据本公开实施例的重复视频的判断硬件装置300包括存储器301和处理器302。重复视频的判断硬件装置300中的各组件通过总线系统和/或其它形式的连接机构(未示出)互连。
该存储器301用于存储非暂时性计算机可读指令。具体地,存储器301可以包括一个或多个计算机程序产品,该计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。该易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器 (cache)等。该非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。
该处理器302可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制重复视频的判断硬件装置300中的其它组件以执行期望的功能。在本公开的一个实施例中,该处理器302用于运行该存储器301中存储的该计算机可读指令,使得该重复视频的判断硬件装置300执行前述的本公开各实施例的重复视频的判断方法的全部或部分步骤。
图14是图示根据本公开的实施例的计算机可读存储介质的示意图。如图14所示,根据本公开实施例的计算机可读存储介质400,其上存储有非暂时性计算机可读指令401。当该非暂时性计算机可读指令401由处理器运行时,执行前述本公开各实施例的重复视频的判断方法的全部或部分步骤。
图15是图示根据本公开实施例的终端设备的硬件结构示意图。终端设备可以以各种形式来实施,本公开中的终端设备可以包括但不限于诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置、车载终端设备、车载显示终端、车载电子后视镜等等的移动终端设备以及诸如数字TV、台式计算机等等的固定终端设备。
如图15所示,终端设备1100可以包括无线通信单元1110、A/V(音频/视频)输入单元1120、用户输入单元1130、感测单元1140、输出单元1150、存储器1160、接口单元1170、控制器1180和电源单元1190等等。图15示出了具有各种组件的终端设备,但是应理解的是,并不要求实施所有示出的组件。可以替代地实施更多或更少的组件。
其中,无线通信单元1110允许终端设备1100与无线通信系统或网络之间的无线电通信。A/V输入单元1120用于接收音频或视频信号。用户输入单元1130可以根据用户输入的命令生成键输入数据以控制终端设备的各种操作。感测单元1140检测终端设备1100的当前状态、终端设备1100的位置、用户对于终端设备1100的触摸输入的有无、终端设备1100的取向、终端设备1100的加速或减速移动和方向等等,并且生成用于控制终端设备1100的操作的命令或信号。接口单元1170用作至少一个外部装置与终端设备1100连接可以通过的接口。输出单元1150被构造为以视觉、音频和/或触觉方式提供输出信号。存储器1160可以存储由控制器1180执行的处理和控制操作的软件程序等等,或者可以暂时地存储己经输出或将要输出的数据。存储器1160可以包括至少一种类型的存储介质。而且,终端设备1100可以与通过网络连接执行存储器1160的存储功能的网络存储装置协作。控制器1180通常控制终端设备的总体操作。另外,控制器1180可以包括用 于再现或回放多媒体数据的多媒体模块。控制器1180可以执行模式识别处理,以将在触摸屏上执行的手写输入或者图片绘制输入识别为字符或图像。电源单元1190在控制器1180的控制下接收外部电力或内部电力并且提供操作各元件和组件所需的适当的电力。
本公开提出的重复视频的判断方法的各种实施方式可以以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,本公开提出的重复视频的判断方法的各种实施方式可通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施,在一些情况下本公开提出的重复视频的判断方法的各种实施方式可以在控制器1180中实施。对于软件实施,本公开提出的重复视频的判断方法的各种实施方式可与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器1160中并且由控制器1180执行。
以上,根据本公开实施例的重复视频的判断方法、装置、硬件装置、计算机可读存储介质以及终端设备,通过利用多种视频特征进行视频查重,能够大大提高判断重复视频的效率、准确性和鲁棒性。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
另外,如在此使用的,在以“至少一个”开始的项的列举中使用的“或”指示分离的列举,以便例如“A、B或C的至少一个”的列举意味着A或B或C,或AB或AC或BC,或ABC(即A和B和C)。此外,措辞“示例的”不意味着描述的例子是优选的或者比其他例子更好。
还需要指出的是,在本公开的系统和方法中,各部件或各步骤是可以 分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
可以不脱离由所附权利要求定义的教导的技术而进行对在此所述的技术的各种改变、替换和更改。此外,本公开的权利要求的范围不限于以上所述的处理、机器、制造、事件的组成、手段、方法和动作的具体方面。可以利用与在此所述的相应方面进行基本相同的功能或者实现基本相同的结果的当前存在的或者稍后要开发的处理、机器、制造、事件的组成、手段、方法或动作。因而,所附权利要求包括在其范围内的这样的处理、机器、制造、事件的组成、手段、方法或动作。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (15)

  1. 一种重复视频的判断方法,所述方法包括:
    获取待查重视频的多种视频特征;
    根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果;
    根据所述序列比对结果,对所述多个已有视频进行第一排名,根据所述第一排名的结果,取出前n个所述已有视频作为第一候选视频,其中n为正整数;
    根据所述第一候选视频的所述序列比对结果,确定所述待查重视频的重复情况。
  2. 根据权利要求1所述的重复视频的判断方法,其中,所述的获取所述待查重视频的多种视频特征包括:
    对所述待查重视频进行抽帧,得到所述待查重视频的多个帧图像;
    提取所述帧图像的多种图像特征作为第一图像特征;
    根据所述待查重视频的多个帧图像的同种的所述第一图像特征,确定所述待查重视频的视频特征作为第一视频特征,以得到多种所述第一视频特征。
  3. 根据权利要求2所述的重复视频的判断方法,其中,所述的提取所述帧图像的多种图像特征包括:
    对于每个所述帧图像,获取一个或多个检测向量,利用每个所述检测向量,以所述帧图像中的任意像素作为起点,确定所述起点的经所述检测向量指向的终点,根据各个所述起点与对应的所述终点之间的差异情况的总体情况来确定所述帧图像的图像特征,作为栅栏特征。
  4. 根据权利要求2所述的重复视频的判断方法,其中,所述的提取所述帧图像的多种图像特征包括:
    对于每个所述帧图像,逐级地进行多种类型的池化,以得到所述帧图像的图像特征,作为池化特征;其中,所述多种类型的池化包括最大池化、最小池化和平均池化。
  5. 根据权利要求2所述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果包括:
    获取一个所述已有视频的多种视频特征作为第二视频特征,每种所述第二视频特征包含多个第二图像特征;
    分别确定同种的每个所述第二图像特征和每个所述第一图像特征之间的单体相似度,以得到多种所述单体相似度;
    确定所述多种单体相似度的平均值或最小值,根据所述多种单体相似度的平均值或最小值确定所述已有视频的相似度矩阵;
    根据所述相似度矩阵确定序列比对评分,所述序列比对评分用于表示所述已有视频与所述待查重视频的相似程度。
  6. 根据权利要求5所述的重复视频的判断方法,其中,所述的根据所述相似度矩阵确定序列比对评分包括:根据所述相似度矩阵中的直线确定序列比对评分。
  7. 根据权利要求5所述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果还包括:
    根据所述相似度矩阵确定所述已有视频和所述待查重视频的重复视频片段。
  8. 根据权利要求2所述的重复视频的判断方法,其中,所述的根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果包括:
    根据至少一种所述第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名,根据所述第二排名的结果,取出前k个所述已有视频作为第二候选视频,其中k为正整数;
    分别对每个所述第二候选视频进行序列比对,得到序列比对结果。
  9. 根据权利要求8所述的重复视频的判断方法,其中,所述的根据至少一种所述第一视频特征中的每个单独的第一图像特征,对多个已有视频进行第二排名包括,
    将至少一种所述第一视频特征中的每个单独的第一图像特征作为索引请求,对多个已有视频进行词频-逆向文件频率排名。
  10. 根据权利要求8所述的重复视频的判断方法,其中,所述根据所述待查重视频的多个帧图像的每种所述第一图像特征,确定所述待查重视频的所述多种视频特征作为第一视频特征包括:
    对所述第一图像特征进行二值化处理;
    根据所述多个帧图像的二值化的所述第一图像特征,确定所述第一视频特征。
  11. 一种重复视频的判断装置,所述装置包括:
    视频特征获取模块,用于获取待查重视频的多种类型的视频特征;
    序列比对模块,用于根据所述待查重视频的多种视频特征,对多个已有视频分别进行序列比对,得到序列比对结果;
    第一排名模块,用于根据所述序列比对结果,对所述多个已有视频进行第一排名,根据所述第一排名的结果,取出前n个所述已有视频作为第 一候选视频,其中n为正整数;
    查重模块,用于根据所述第一候选视频的所述序列比对结果,确定所述待查重视频的重复情况。
  12. 根据权利要求11所述的重复视频的判断装置,还包括执行权利要求2到10中任一权利要求所述步骤的模块。
  13. 一种重复视频的判断硬件装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现根据权利要求1到10中任意一项所述的重复视频的判断方法。
  14. 一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行权利要求1到10中任意一项所述的重复视频的判断方法。
  15. 一种终端设备,包括权利要求11或12所述的一种重复视频的判断装置。
PCT/CN2018/125500 2018-03-29 2018-12-29 一种重复视频的判断方法及装置 WO2019184522A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019572032A JP7000468B2 (ja) 2018-03-29 2018-12-29 重複ビデオの判定方法及び装置
US16/958,513 US11265598B2 (en) 2018-03-29 2018-12-29 Method and device for determining duplicate video
SG11201914063RA SG11201914063RA (en) 2018-03-29 2018-12-29 Method and device for determining duplicate video

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810273706.3 2018-03-29
CN201810273706.3A CN110324660B (zh) 2018-03-29 2018-03-29 一种重复视频的判断方法及装置

Publications (1)

Publication Number Publication Date
WO2019184522A1 true WO2019184522A1 (zh) 2019-10-03

Family

ID=68059435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125500 WO2019184522A1 (zh) 2018-03-29 2018-12-29 一种重复视频的判断方法及装置

Country Status (5)

Country Link
US (1) US11265598B2 (zh)
JP (1) JP7000468B2 (zh)
CN (1) CN110324660B (zh)
SG (1) SG11201914063RA (zh)
WO (1) WO2019184522A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696105A (zh) * 2020-06-24 2020-09-22 北京金山云网络技术有限公司 视频处理方法、装置和电子设备
CN111738173A (zh) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 视频片段检测方法、装置、电子设备及存储介质
CN111914926A (zh) * 2020-07-29 2020-11-10 深圳神目信息技术有限公司 基于滑窗的视频抄袭检测方法、装置、设备和介质
CN113283351A (zh) * 2021-05-31 2021-08-20 深圳神目信息技术有限公司 一种使用cnn优化相似度矩阵的视频抄袭检测方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569373B (zh) * 2018-03-29 2022-05-13 北京字节跳动网络技术有限公司 一种媒体特征的比对方法及装置
CN110321759B (zh) * 2018-03-29 2020-07-07 北京字节跳动网络技术有限公司 一种视频特征提取方法及装置
CN112507875A (zh) * 2020-12-10 2021-03-16 上海连尚网络科技有限公司 一种用于检测视频重复度的方法与设备
CN112653885B (zh) * 2020-12-10 2023-10-03 上海连尚网络科技有限公司 视频重复度获取方法、电子设备及存储介质
CN113378902B (zh) * 2021-05-31 2024-02-23 深圳神目信息技术有限公司 一种基于优化视频特征的视频抄袭检测方法
CN113965806B (zh) * 2021-10-28 2022-05-06 腾讯科技(深圳)有限公司 视频推荐方法、装置和计算机可读存储介质
CN114117112B (zh) * 2022-01-25 2022-05-24 深圳爱莫科技有限公司 通用的文本图片查重方法、存储介质及处理设备
CN115017350B (zh) * 2022-06-14 2024-06-25 湖南大学 基于深度学习的海报图像查重检索方法、装置和电子设备
CN116628265A (zh) * 2023-07-25 2023-08-22 北京天平地成信息技术服务有限公司 Vr内容管理方法、管理平台、管理设备和计算机存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6597738B1 (en) * 1999-02-01 2003-07-22 Hyundai Curitel, Inc. Motion descriptor generating apparatus by using accumulated motion histogram and a method therefor
CN104053023A (zh) * 2014-06-13 2014-09-17 海信集团有限公司 一种确定视频相似度的方法及装置
CN105893405A (zh) * 2015-11-12 2016-08-24 乐视云计算有限公司 重复视频检测方法和系统
CN106034240A (zh) * 2015-03-13 2016-10-19 小米科技有限责任公司 视频检测方法及装置
CN106375781A (zh) * 2015-07-23 2017-02-01 无锡天脉聚源传媒科技有限公司 一种重复视频的判断方法及装置
CN106375850A (zh) * 2015-07-23 2017-02-01 无锡天脉聚源传媒科技有限公司 一种匹配视频的判断方法及装置
CN107665261A (zh) * 2017-10-25 2018-02-06 北京奇虎科技有限公司 视频查重的方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120114167A1 (en) * 2005-11-07 2012-05-10 Nanyang Technological University Repeat clip identification in video data
US8422795B2 (en) * 2009-02-12 2013-04-16 Dolby Laboratories Licensing Corporation Quality evaluation of sequences of images
JP2012226477A (ja) 2011-04-18 2012-11-15 Nikon Corp 画像処理プログラム、画像処理方法、画像処理装置、撮像装置
US9092520B2 (en) * 2011-06-20 2015-07-28 Microsoft Technology Licensing, Llc Near-duplicate video retrieval
CN102779184B (zh) * 2012-06-29 2014-05-14 中国科学院自动化研究所 一种近似重复视频片段自动定位方法
CN103617233B (zh) * 2013-11-26 2017-05-17 烟台中科网络技术研究所 一种基于语义内容多层表示的重复视频检测方法与装置
CN109478319A (zh) 2016-07-11 2019-03-15 三菱电机株式会社 动态图像处理装置、动态图像处理方法及动态图像处理程序
US10296540B1 (en) * 2016-09-08 2019-05-21 A9.Com, Inc. Determine image relevance using historical action data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6597738B1 (en) * 1999-02-01 2003-07-22 Hyundai Curitel, Inc. Motion descriptor generating apparatus by using accumulated motion histogram and a method therefor
CN104053023A (zh) * 2014-06-13 2014-09-17 海信集团有限公司 一种确定视频相似度的方法及装置
CN106034240A (zh) * 2015-03-13 2016-10-19 小米科技有限责任公司 视频检测方法及装置
CN106375781A (zh) * 2015-07-23 2017-02-01 无锡天脉聚源传媒科技有限公司 一种重复视频的判断方法及装置
CN106375850A (zh) * 2015-07-23 2017-02-01 无锡天脉聚源传媒科技有限公司 一种匹配视频的判断方法及装置
CN105893405A (zh) * 2015-11-12 2016-08-24 乐视云计算有限公司 重复视频检测方法和系统
CN107665261A (zh) * 2017-10-25 2018-02-06 北京奇虎科技有限公司 视频查重的方法及装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696105A (zh) * 2020-06-24 2020-09-22 北京金山云网络技术有限公司 视频处理方法、装置和电子设备
CN111738173A (zh) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 视频片段检测方法、装置、电子设备及存储介质
CN111696105B (zh) * 2020-06-24 2023-05-23 北京金山云网络技术有限公司 视频处理方法、装置和电子设备
CN111914926A (zh) * 2020-07-29 2020-11-10 深圳神目信息技术有限公司 基于滑窗的视频抄袭检测方法、装置、设备和介质
CN111914926B (zh) * 2020-07-29 2023-11-21 深圳神目信息技术有限公司 基于滑窗的视频抄袭检测方法、装置、设备和介质
CN113283351A (zh) * 2021-05-31 2021-08-20 深圳神目信息技术有限公司 一种使用cnn优化相似度矩阵的视频抄袭检测方法
CN113283351B (zh) * 2021-05-31 2024-02-06 深圳神目信息技术有限公司 一种使用cnn优化相似度矩阵的视频抄袭检测方法

Also Published As

Publication number Publication date
US20210058667A1 (en) 2021-02-25
US11265598B2 (en) 2022-03-01
JP7000468B2 (ja) 2022-01-19
SG11201914063RA (en) 2020-01-30
CN110324660A (zh) 2019-10-11
JP2020525935A (ja) 2020-08-27
CN110324660B (zh) 2021-01-19

Similar Documents

Publication Publication Date Title
WO2019184522A1 (zh) 一种重复视频的判断方法及装置
WO2019184518A1 (zh) 一种音频检索识别方法及装置
US10528613B2 (en) Method and apparatus for performing a parallel search operation
US20140240603A1 (en) Object detection metadata
CN108427925B (zh) 一种基于连续拷贝帧序列的拷贝视频检测方法
CN110162665B (zh) 视频搜索方法、计算机设备及存储介质
WO2014062508A1 (en) Near duplicate images
CN110381392B (zh) 一种视频摘要提取方法及其系统、装置、存储介质
CN110110113A (zh) 图像搜索方法、系统及电子装置
CN111368867B (zh) 档案归类方法及系统、计算机可读存储介质
CN111598012B (zh) 一种图片聚类管理方法、系统、设备及介质
WO2020125100A1 (zh) 一种图像检索方法、装置以及设备
WO2017156963A1 (zh) 一种指纹解锁的方法及终端
WO2019184520A1 (zh) 一种视频特征提取方法及装置
Zhang et al. Large‐scale video retrieval via deep local convolutional features
CN110826365A (zh) 一种视频指纹生成方法和装置
US11874869B2 (en) Media retrieval method and apparatus
US11593582B2 (en) Method and device for comparing media features
CN108536769B (zh) 图像分析方法、搜索方法及装置、计算机装置及存储介质
CN111275683A (zh) 图像质量评分处理方法、系统、设备及介质
CN112100412B (zh) 图片检索方法、装置、计算机设备和存储介质
CN110717362B (zh) 数位影像的特征树结构的建立方法与影像物件辨识方法
Ren et al. Visual words based spatiotemporal sequence matching in video copy detection
WO2019184521A1 (zh) 一种视频特征提取方法及装置
Jinliang et al. Copy image detection based on local keypoints

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18912746

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019572032

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18912746

Country of ref document: EP

Kind code of ref document: A1