EP3477506B1 - Video detection method, server and storage medium - Google Patents
Video detection method, server and storage medium Download PDFInfo
- Publication number
- EP3477506B1 EP3477506B1 EP17814638.7A EP17814638A EP3477506B1 EP 3477506 B1 EP3477506 B1 EP 3477506B1 EP 17814638 A EP17814638 A EP 17814638A EP 3477506 B1 EP3477506 B1 EP 3477506B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- video
- server
- data
- picture
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims description 55
- 238000000034 method Methods 0.000 claims description 78
- 238000004590 computer program Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 description 32
- 238000010586 diagram Methods 0.000 description 21
- 238000004364 calculation method Methods 0.000 description 18
- 239000000284 extract Substances 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/735—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Television Signal Processing For Recording (AREA)
Description
- This application claims priority to
Chinese Patent Application No. 201610457780.1, filed with the Chinese Patent Office on June 22, 2016 - The present disclosure relates to video data processing technologies, and in particular, to a video detection method, a server and a storage medium.
- Video copy detection becomes a hot topic researched by people to achieve copyright protection of digital videos. In the existing technology, usually, the following two solutions are used to detect whether a video involves copyright infringement. A first solution is to detect audio of the video. However, the same audio may exist in different videos having copyright, for example, music videos (MV). The copyright of the audio protects only audio data. Consequently, copyright protection for video data cannot be achieved by only the audio copyright protection. The second solution is to detect images of a video. Compared with the audio data, the video data has a large quantity of information. During detection, a large quantity of fingerprint features need to be calculated, so as to be compared with fingerprint features in a copyright library, which requires a high calculation capability and consumes a large quantity of calculation resources.
- Y. Tian et al., "Video Copy-Detection and Localization with a Scalable Cascading Framework", IEEE Multimedia, Vol. 20, No. 3, July 2013, pages 72-86, discloses a novel video copy-detection and localization approach with scalable cascading of complementary detectors and multiscale sequence matching. The approach is described as being able to improve copy-detection accuracy and localization precision for most audio-visual transformations.
- Y Tian et al., "TASC: A Transformation-Aware Soft Cascading Approach for Multimodal Video Copy Detection", ACM Transactions on Information Systems, Vol. 33, No. 2, February 2015, pages 1-34, is concerned with the problem of how to precisely and efficiently detect near-duplicate copies with complicated audiovisual transformations from a large-scale video database, which is a challenging task. To cope with this challenge, this article proposes a transformation-aware soft cascading (TASC) approach for multimodal video copy detection. Basically, the approach divides query videos into some categories and then for each category designs a transformation-aware chain to organize several detectors in a cascade structure.
- To resolve a currently-existing technical problem, embodiments of the present disclosure provide a video detection method and a server, which can met the requirement of copyright protection on video data without requiring a high calculation capability and huge calculation resource consumption.
- An embodiment of the present disclosure provides a video copyright detection method, including:
- obtaining, by a server, first video data to be detected, and decoding, by the server, the first video data, to obtain audio data of the first video data;
- analyzing, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data; and querying, by the server based on the audio fingerprint data, an audio fingerprint library;
- obtaining, by the server, a video label and a time parameter corresponding to the audio fingerprint data if the audio fingerprint library includes the audio fingerprint data;
- querying, by the server, a video copyright library using the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting from the first video data a second picture that satisfies the time parameter;
- extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture; and
- comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library,
- extracting, by the server, pictures from second video data having copyright, and, for each of the extracted pictures, storing the extracted picture, a time parameter corresponding to the picture, and a video label corresponding to the picture in the video copyright library,
- where the extracting, by the server, pictures from second video data having copyright includes:
- performing, by the server, scene recognition on the second video data, and recognizing and filtering out a first picture collection representing scene switching in the second video data, to obtain a second picture collection;
- analyzing, by the server, pictures in the second picture collection, to obtain edge feature information of the picture in the second picture collection; and
- extracting, by the server, pictures of which a quantity of the edge feature information reaches a preset threshold.
- In the foregoing solution, the extracting, by the server, pictures of which a quantity of the edge feature information reaches a preset threshold includes:
- generating, by the server, a third picture collection of pictures each of which the quantity of the edge feature information reaches the preset threshold; and
- extracting, by the server, pictures from the third picture collection at a preset time interval.
- In the foregoing solution, the analyzing, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data includes:
extracting, by the server, a feature parameter of the audio data, and obtaining, based on the feature parameter, the audio fingerprint data corresponding to the audio data. - In the foregoing solution, the extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture includes: extracting the first feature parameter of the first picture and the second feature parameter of the second picture by at least one of the following methods:
a scale-invariant feature transform (SIFT) method and a histogram of oriented gradient (HOG) method. - In the foregoing solution, the comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library includes:
calculating, by the server, similarity between the first feature parameter and the second feature parameter; and determining, by the server, that the first video data is consistent with the video in the video copyright library if the similarity reaches a preset threshold. - An embodiment of the present disclosure further provides a server, including: at least one processor and a memory storing a processor-executable instruction, the instruction, when executed by the at least one processor, causing the server to perform the foregoing video copyright detection method.
- An embodiment of the present disclosure provides a non-volatile storage medium, storing one or more computer programs, the computer programs including an instruction that is executable by a possessor comprising one or more memories, the instruction, when executed by a computer, causing the computer to perform the foregoing video copyright detection method.
- In the video copyright detection method and the server provided by the embodiments of the present disclosure, by combining audio content detection and video content detection, i.e., by primarily using audio content detection and secondarily using video content detection, as compared with the case that only video content detection is used, required calculation capabilities are greatly decreased, calculation resource consumption is reduced, and a shortcoming of copyright protection merely using an audio fingerprint is overcome, so as to meet a video copyright protection requirement.
-
-
FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure; -
FIG. 2a to FIG. 2d are schematic diagrams of generating audio fingerprint data according to an embodiment of the present disclosure; -
FIG. 3 is a schematic diagram of a process for extracting and matching audio fingerprint data according to an embodiment of the present disclosure; -
FIG. 4 is a schematic diagram of a process for establishing a video copyright library according to an embodiment of the present disclosure; -
FIG. 5 is a schematic flowchart of another video detection method according to an embodiment of the present disclosure; -
FIG. 6a and FIG. 6b are schematic diagrams of a system architecture applying a video detection method according to an embodiment of the present disclosure; -
FIG. 7 is a schematic diagram of a structure of a server according to an embodiment of the present disclosure; and -
FIG. 8 is a schematic diagram of a hardware structure of a server according to an embodiment of the present disclosure. - With the development of Internet technologies, there are a growing number of video sharing platforms. A user may upload various video data, including an MV, a short video, a TV play, a movie, a variety show video, an animation video, and the like. A video sharing platform server performs copyright detection on the video data uploaded by the user. Usually, there is a video copyright database. If content of the uploaded video data is consistent with that of video data in the video copyright database, the uploaded video data involves a copyright conflict. Subsequently, an operation is performed on the video data involving the copyright conflict or an uploader. For example, the video data involving the copyright conflict is deleted, or a legal affair may arise on the uploaded under a serious circumstance. If the content of the uploaded video data is not consistent with that of the video data in the video copyright database, it may be determined that the video data is original video data made by the user. Copyright protection on original data may be provided for the user.
- Based on this, the following embodiments of the present disclosure are provided. According to the technical solution of the embodiments of the present disclosure, a process of matching the uploaded video data with the video data in the video copyright database is described in detail.
- The following describes the technical solution in detail with reference to the accompanying drawings and specific embodiments.
- In an embodiment of the present disclosure, a video detection method is provided.
FIG. 1 is a schematic flowchart of a video detection method according to an embodiment of the present disclosure. As shown inFIG. 1 , the video detection method includessteps 101 to 107. - In
step 101, first video data to be detected is obtained, and the first video data is decoded to obtain audio data of the first video data. - In
step 102, the audio data is analyzed, to obtain audio fingerprint data corresponding to the audio data. - In
step 103, based on the audio fingerprint data, an audio fingerprint library is queried. - In
step 104, a video label and a time parameter corresponding to the audio fingerprint data are obtained if the audio fingerprint library includes the audio fingerprint data. - In
step 105, a video copyright library is queried using the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and a second picture that satisfies the time parameter is extracted from the first video data. - In
step 106, a first feature parameter of the first picture and a second feature parameter of the second picture are extracted. - In
step 107, the first feature parameter is compared with the second feature parameter, and it is determined, based on a comparison result, whether the first video data is consistent with a video in the video copyright library. - The video detection method of this embodiment is applied to a video sharing platform server or server cluster.
- In
step 101 of this embodiment, the first video data is video data to be detected, which may be video data uploaded by a user, including an MV, a TV play, a movie, an animation video, or the like. The first video data is decoded to obtain the audio data of the first video data. - In
step 102 of this embodiment, the audio data is analyzed to obtain the audio fingerprint data corresponding to the audio data. The audio fingerprint data may be quantized data representing a feature parameter of the audio data, and may be represented by using a binary value. The analyzing the audio data, to obtain audio fingerprint data corresponding to the audio data includes: extracting the feature parameter of the audio data, and obtaining based on the feature parameter, the audio fingerprint data corresponding to the audio data. - In an implementation, since audio is a type of sound wave, the feature parameter of the audio data may be presented by a sampling rate.
FIG. 2a to FIG. 2d are schematic diagrams of a process of generating audio fingerprint data according to an embodiment of the present disclosure.FIG. 3 is a schematic diagram of a process for extracting and matching audio fingerprint data according to an embodiment of the present disclosure. As shown bystep 201 and step 202 inFIG. 3 , the sampling rate of the audio data is converted to be K samples/s. K is, for example, 8000. The collected audio data represented in one-dimensional time domain is converted into a two-dimensional time domain diagram by using a windowed Fourier transform, as shown inFIG. 2a . A significant feature point that is extracted in the two-dimensional time domain diagram shown inFIG. 2a is used as a significant feature. For example, feature points inFIG. 2b are obtained by searching peaks in a frequency spectrum. For each selected feature point, such as a feature point A as shown inFIG. 2c , a proper window, such as a window area 1 inFIG. 2c , is selected based on time and frequency to perform feature hash value conversion. For example, for the feature point A, coordinates of the feature point A are (f1, t1). A maximum feature point in a frequency domain corresponding to a time is selected in the window area 1, for example, a feature point C. Coordinates of the feature point C are (f2, t2). A hash value formed by the feature point C and the feature point A may be Hash: time=[f1:f2:Δt]:t1, where Δt=t2-t1. In this embodiment, the obtained hash value is represented as a binary value, and the binary data is the audio fingerprint data corresponding to the audio data. - Certainly, the method for obtaining audio fingerprint data in this embodiment of the present disclosure is not limited to the foregoing obtaining method, and another method for obtaining audio fingerprint data that represents an audio data feature falls within the protection scope of the embodiments of the present disclosure.
- In
step 103 of this embodiment, the audio fingerprint library records audio fingerprint data of video data having copyright, a corresponding video label, and a time parameter corresponding to the audio fingerprint data. The video label may be represented by using a video identification, for example, a sequence number or a code, for convenience of searching in the video copyright library using the video label. The audio fingerprint data in the audio fingerprint library may be obtained by performing audio data extraction on the video data having copyright using the method for extracting audio fingerprint data according to this embodiment of the present disclosure, which is not repeated herein. - In this embodiment, if the audio fingerprint library does not include the audio fingerprint data, it indicates that the first video data is not consistent with video data in the video copyright library. That is, it indicates that the first video data does not conflict with the video data having the copyright, in other words, the first video data does not involve a copyright conflict. Correspondingly, step 104 to step 107 are not performed, and a determining result indicating that the first video data is not consistent with the video data in the video copyright library may be directly obtained.
- As shown by
step 203 inFIG. 3 , a video label and a time parameter corresponding to the audio fingerprint data are obtained as matching information if the audio fingerprint library includes the audio fingerprint data. The video label may be represented by using a video identification, for example, a sequence number or a code. In an actual application, the audio fingerprint library stores a large quantity of information. To reduce a matching time and increase a matching speed without losing accuracy, a fuzzy matching method is used instep 203. That is, in the matching processing, a mount of matching hashes having same audio fingerprint data and a same time difference (for example, a difference between a hash time point of inputting a video and a hash time point in the video copyright library (the video fingerprint library)) are counted, and the first N pieces of the matched audio fingerprint data are selected to form a potential matched audio segment. In a next stage, each segment of the matched audio fingerprint data is selected to analyze the density of a hash amount of the audio fingerprint data at a particular time. If the density is greater than a threshold, the segment is remained; otherwise, the segment is removed. Therefore, a hash value segment whose density is greater than the threshold is remained. In this process, a potential audio segment having a low matching density may be removed, thereby improving the matching accuracy. De-duplication selection is performed on the selected potential matching audio segment, so that for repeated audio matched in time, audio having a largest hash density is selected as final matching audio. For the matching process, one-to-one or one-to-many matching may be allowed. In this process, filtering may be performed when the hash density is analyzed, and only a segment of audio having the longest matching time is remained. The foregoing process may be referred to as a classification and filtering process, that is,step 204, in which the final matching information is obtained. - In
step 105 of this embodiment, after the video label corresponding to the audio fingerprint data is obtained, the video copyright library is queried using the video label. The video copyright library records a key frame picture of video data having copyright, the video label, a time parameter corresponding to the key frame picture, and the like. The video copyright library is queried to obtain the first picture that is corresponding to the video label and that satisfies the time parameter, and the second picture that satisfies the time parameter is extracted from the first video data. In a process of matching picture content of video data, the video copyright library is queried according to the video label obtained by matching the audio fingerprint data, to obtain a picture collection corresponding to the video label. A picture (that is, the first picture) satisfying the time parameter obtained by matching the audio fingerprint data is read from the picture collection. Correspondingly, for the first video data, a picture (that is, the second picture) that satisfies the time parameter obtained by matching the audio fingerprint data is extracted from the first video data. - In this embodiment, the first feature parameter of the first picture and the second feature parameter of the second picture are extracted. The first feature parameter is compared with the second feature parameter to determine whether the first video data is consistent with video data in a copyright database. The extracting a first feature parameter of the first picture and a second feature parameter of the second picture includes: extracting the first feature parameter of the first picture and the second feature parameter of the second picture by at least one of the following methods: a Scale-Invariant Feature Transform (SIFT) method and a Histogram of Oriented Gradient (HOG) method. Further, the comparing the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library includes: calculating similarity between the first feature parameter and the second feature parameter by using the SIFT method or the HOG method; and determining that the first video data is consistent with the video in the video copyright library if the similarity reaches a preset threshold.
- According to the invention, before the video copyright library is queried using the video label, in other words, in a process of establishing the video copyright library, the method includes: extracting a picture from second video data having copyright, and storing the extracted picture, a time parameter corresponding to the picture, and a video label corresponding to the picture in the video copyright library.
- The extracting a picture from second video data having copyright includes: performing scene recognition on the second video data, recognizing and filtering out a first picture collection representing scene switching in the second video data, to obtain a second picture collection; analyzing a picture in the second picture collection, to obtain edge feature information of the picture; and extracting a picture of which a quantity of the edge feature information reaches a preset threshold.
- In the process of establishing the video copyright library, key frame picture extraction is performed on all video data having copyright.
FIG. 4 is a schematic diagram of a process for establishing a video copyright library according to an embodiment of the present disclosure. As shown inFIG. 4 , the video data having copyright is input and a key frame picture of the input video data is extracted. A process of extracting the key frame picture of the video data includes: first, performing scene switching detection on the video data. The scene switching detection may be performed by detecting a foreground and/or a background in the picture. If it is detected that the foregrounds and/or backgrounds of two pictures are inconsistent, it may be determined that a time point between time points corresponding to the two pictures is a scene switching time point. It may also be understood as that a first picture having an early time is the last picture of the previous scene, and the other picture having a later time is the first picture of a next scene. In this embodiment, scene recognition is performed on the input video data, and a first picture collection representing scene switching is recognized. In the process of extracting the key frame picture, first, the first picture collection is filtered out from a picture collection included in the video data, to prevent that an extracted key frame picture is at a scene switching position which may reduce accuracy of subsequent picture content matching. Further, analysis is performed on remaining pictures in the second picture collection. The complexity of the pictures is analyzed, to extract a picture with high complexity as a key frame picture. In an implementation, picture edge feature information may be analyzed in this embodiment of the present disclosure to find out, in the second picture collection, a picture that has a large quantity of the edge feature information as the key frame picture. This is because a larger quantity of the edge feature information of a picture indicates higher complexity of the picture. In a process of matching two pictures, more complex content leads to higher matching accuracy. - In an implementation, the extracting a picture of which a quantity of the edge feature information reaches a preset threshold includes: generating a third picture collection of pictures each of which the quantity of the edge feature information reaches the preset threshold; and extracting pictures from the third picture collection at a preset time interval.
- In this embodiment, there may be many pictures of which the quantity of the edge feature information is large. To reduce picture storage space and a calculation amount of feature matching, a picture may be extracted as a key frame picture at the preset time interval. For example, one picture is extracted every K second. This may greatly reduce a quantity of the extracted key frame pictures, greatly reduce quantity of stored pictures and the calculation amount of feature matching, and reduce consumed calculation resources of a server.
- Based on the video detection solution described above, in short, the technical solution of this embodiment of the present disclosure is implemented by primarily using audio content detection and secondarily using video content detection. In other words, detection of audio content and video content are combined to determine whether a video to be detected involves copyright conflict.
FIG. 5 is a schematic flowchart of another video detection method according to an embodiment of the present disclosure. As shown inFIG. 5 , first, audio data is extracted from video data to be detected. First, it is determined whether the audio data is consistent with audio data of video data having copyright. Audio fingerprint data of the audio data is matched with audio fingerprint data in the audio fingerprint library. If the audio fingerprint data of the audio data does not match the audio fingerprint data in the audio fingerprint library, matching result is determined as that the video to be detected does not conflict with the video having the copyright. If the audio fingerprint data of the audio data matches the audio fingerprint data in the audio fingerprint library, a video label and a time parameter corresponding to the audio fingerprint data are obtained from the audio fingerprint library. Content of the video to be detected is further matched with content of the video in a video copyright library. The video copyright library is queried using the video label to obtain a video source. A first picture satisfying the time parameter is obtained from the video source. A picture is extracted from the video to be detected according to the time parameter to be used as a second picture. Fingerprint feature extraction is performed on the first picture and the second picture. Feature extraction may be performed by a SIFT method. Then, feature matching is performed, and similarity between the two pictures is calculated. If the calculated similarity reaches a preset threshold, it indicates that the similarity between the two pictures is high, so that a matching result is determined as that the video to be detected conflicts with the video having copyright. If the calculated similarity does not reach a preset threshold, it indicates that the similarity between the two pictures is low, so that the matching result is determined as that the video to be detected does not conflict with the video having copyright. - If the matching result is that the video to be detected does not conflict with the video having copyright, it may be determined that the video data is original video data made by a user and copyright protection on original data and sharing of advertising revenue may be provided for the user, to encourage excellent video data makers, thereby providing better video content for the video sharing platform. If the matching result is that the video to be detected conflict with the video having copyright, the video data to be detected involves copyright conflict. Subsequently, an operation is performed on the video data involving copyright conflict or an uploader. For example, the video data involving the copyright conflict is deleted, or a legal affair arises on the uploaded under a serious circumstance. Therefore, copyright protection is provided for video data having copyright effectively.
- In the existing technology, detection solutions based on video content usually includes two of the following: 1. a space color-based video fingerprint; 2. a feature extraction- based video fingerprint.
- The space color-based video fingerprint basically is a histogram of a picture in a time period in a particular area. A feature of a color varies with different formats of a video. Consequently, the color-based video fingerprint does not have a high anti-noise capability to, for example, changes such as added brand or black border.
- A two-dimensional discrete cosine transform (2D-DCT) video fingerprint is representative of the feature-based video fingerprint. The 2D-DCT video fingerprint is widely applied to hash-based image retrieval. The hash-based image retrieval includes the following main steps. First, frame rate transforming is performed on a video in the time domain, so that the frame rate is changed into a low frame rate F (F=4). Subsequently, pictures are scaled down and then are changed into black and white pictures. The changed black and white pictures form a small segment (for example, a slice formed by J pictures) in a time domain of the video in a time length. Subsequently, TIRI is used to combine information continuous pictures in each slice in the time domain, to obtain one picture. Next, the 2D-DCT (for example, an 8×8 DCT) is performed on the obtained picture representing the information in the time domain. A median value is found in the transformed DCT values. A transformed fingerprint is changed into a two-dimensional vector by using a two-dimensional median value, thereby representing a feature of the video in an area.
- The video detection solution based on space color has poor accuracy and anti-noise capability. The video detection solution based on feature requires a high calculation capability and consumes a large quantity of calculation resources. Compared with the existing video detection solution, in the video detection solution of this embodiment of the present disclosure, the audio fingerprint detection is primarily used and the video fingerprint detection is secondarily used. As compared with the case that only video content detection is used, this greatly decreases required calculation capabilities and reduces calculation resource consumption while overcoming the shortcoming of copyright protection merely using an audio fingerprint, so as to meet a video copyright protection requirement.
-
FIG. 6a and FIG. 6b are schematic diagrams of a system architecture applying a video detection method according to an embodiment of the present disclosure. As shown inFIG. 6a , the system includesservers 31, ..., 3n and terminal devices 21-23. The terminal devices 21-23 may interact with the server through a network. The terminal devices 21-23 may include a mobile phone, a desktop computer, a notebook computer, and the like. Theservers 31, ..., 3n may be a server cluster of a video sharing platform, and the server cluster may be implemented as shown inFIG. 6b . As shown inFIG. 6b , the server cluster includes atask server 311, anaudio detection server 312, avideo detection server 313, and acontent storage server 314. After the terminal device uploads video data, thetask server 311 obtains the video data and initiates a detection task. The task is submitted to theaudio detection server 312 to perform audio detection and compare the video data with existing content having audio copyright. Theaudio detection server 313 finds videos having copyright and having the same audio data and obtains matching information. Further, the matching information is submitted to thevideo detection server 313. Subsequently, video content detection is performed on the input video and these videos having copyright and having the same audio data, thereby determining whether the inputted video have the same audio and video as the existing videos. A determining result is stored in thecontent storage server 314. Finally, a related staff may perform a series of work related to the copyright based on content stored in thecontent storage server 314, such as raising a legal affair. - An embodiment of the present disclosure further provides a server.
FIG. 7 is a schematic diagram of a structure of a server according to an embodiment of the present disclosure. As shown inFIG. 7 , the server includes: anaudio processing module 41, an audiofingerprint storage module 42, avideo processing module 43, and a videocopyright storage module 44. - The
audio processing module 41 is configured to: obtain first video data to be detected; decode the first video data, to obtain audio data of the first video data; analyze the audio data, to obtain audio fingerprint data corresponding to the audio data; and is further configured to: query, based on the audio fingerprint data, the audiofingerprint storage module 42; and obtain a video label and a time parameter corresponding to the audio fingerprint data if an audio fingerprint library includes the audio fingerprint data. - The audio
fingerprint storage module 42 is configured to store the audio fingerprint data, and the corresponding video label and time parameter. - The
video processing module 43 is configured to: query the videocopyright storage module 44 using the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extract from the first video data a second picture that satisfies the time parameter; extract a first feature parameter of the first picture and a second feature parameter of the second picture; and compare the first feature parameter with the second feature parameter, and determine, based on a comparison result, whether the first video data is consistent with a video in the video copyright library. - The video
copyright storage module 44 is configured to store the video label, and the corresponding picture and time parameter. - The server in this embodiment may be an independent server or a server cluster. Based on the system architecture shown in
FIG. 6a and FIG. 6b , modules in this embodiment may be implemented by using any server or server cluster in the system architecture. The server cluster may be servers shown inFIG. 6b . - In this embodiment, the first video data is video data to be detected, which may be video data uploaded by a user, including an MV, a TV play, a movie, an animation video, or the like. The
audio processing module 41 decodes the obtained first video data to separate the audio data of the first video data. - In this embodiment, the
audio processing module 41 analyzes the audio data to obtain the audio fingerprint data corresponding to the audio data. The audio fingerprint data may be quantized data representing a feature parameter of the audio data, and may be represented by using a binary value. Theaudio processing module 41 is configured to extract a feature parameter of the audio data, and obtain, based on the feature parameter, the audio fingerprint data corresponding to the audio data. - In an implementation, since audio is a type of sound wave, the feature parameter of the audio data may be presented by a sampling rate. As shown in
FIG. 3 , the sampling rate of the audio data is converted to be K samples/s. K is, for example, 8000. The collected audio data represented in one-dimensional time domain is converted into a two-dimensional time domain diagram by using a windowed Fourier transform, as shown inFIG. 2a . A significant feature point that is extracted in the two-dimensional time domain diagram shown inFIG. 2a is used as a significant feature. For example, feature points inFIG. 2b are obtained by searching peaks in a frequency spectrum. For each selected feature point, such as a feature point A as shown inFIG. 2c , a proper window, such as a window area 1 inFIG. 2c , is selected based on time and frequency to perform feature hash value conversion. For example, for the feature point A, coordinates of the feature point A are (f1, t1). A maximum feature point in a frequency domain corresponding to a time is selected in the window area 1, for example, a feature point C. Coordinates of the feature point C are (f2, t2). A hash value formed by the feature point C and the feature point A may be Hash: time=[f1:f2:Δt]:t1, where Δt=t2-t1. In this embodiment, the obtained hash value is represented as a binary value, and the binary data is the audio fingerprint data corresponding to the audio data. - Certainly, the method for obtaining the audio fingerprint data by the
audio processing module 41 in this embodiment of the present disclosure is not limited to the foregoing obtaining method, and another method for obtaining audio fingerprint data that represents an audio data feature and that can be obtained falls within the protection scope of the embodiments of the present disclosure. - In this embodiment, the audio
fingerprint storage module 42 records audio fingerprint data of video data having a copyright, a corresponding video label, and a time parameter corresponding to the audio fingerprint data. The video label may be represented by using a video identification, for example, a sequence number or a code, for convenience of searching in the videocopyright storage module 44 using the video label. The audio fingerprint data in the audiofingerprint storage module 42 may be obtained by performing audio data extraction on the video data having copyright by theaudio processing module 41 using the method for extracting audio fingerprint data according to this embodiment of the present disclosure. Then the audio fingerprint data is stored in the audiofingerprint storage module 42. The description of the obtaining method is not repeated herein. - In this embodiment, if the audio
fingerprint storage module 42 does not include the audio fingerprint data, it indicates that the first video data is not consistent with video data in the videocopyright storage module 44. That is, it indicates that the first video data does not conflict with the video data having the copyright, and in other words, the first video data does not involve copyright conflict. Correspondingly, a process of matching the video content is not performed subsequently, and a determining result indicating that the first video data is not consistent with the video data in the videocopyright storage module 44 may be directly obtained. - As shown in
FIG. 3 , if the audiofingerprint storage module 42 includes the audio fingerprint data, theaudio processing module 41 obtains a video label and a time parameter corresponding to the audio fingerprint data. The video label may be represented by using a video identification, for example, a sequence number or a code. In an actual application, the audio fingerprint library stores a large quantity of information. To reduce a matching time and increase a matching speed without losing accuracy, a fuzzy matching method may be used. That is, in the matching process, a mount of matching hash having same audio fingerprint data and a same time difference (for example, a difference between a hash time point of inputting a video and a hash time point in the video copyright library (the video fingerprint library)) are counted, and the first N pieces of the matched audio fingerprint data are selected to form a potential matched audio segment. In a next stage, each segment of the matched audio fingerprint data is selected to analyze the density of a hash amount of the audio fingerprint data at a particular time. If the density is greater than a threshold, the segment is remained; otherwise, the segment is removed. Therefore, a hash value segment whose density is greater than the threshold is remained. In this process, a potential audio segment having a low matching density may be removed, thereby improving the matching accuracy. De-duplication selection is performed on the selected potential matching audio segment, so that for repeated audio matched in time, audio having a largest hash density is selected as final matching audio. For the matching process, one-to-one or one-to-many matching may be allowed. In this process, filtering may be performed when the hash density is analyzed, and only a segment of audio having the longest matching time is remained. The foregoing process may be referred to as a classification and filtering process, with which the final matching information is obtained. - In this embodiment, after the
audio processing module 41 obtains the video label corresponding to the audio fingerprint data, the video label and the corresponding time parameter are sent to thevideo processing module 43. Thevideo processing module 43 queries the videocopyright storage module 44 using the video label. The videocopyright storage module 44 records a key frame picture of video data having copyright, a video label, a time parameter corresponding to the key frame picture, and the like. Thevideo processing module 43 queries the videocopyright storage module 44, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracts from the first video data a second picture that satisfies the time parameter. In a process of matching picture content of video data, the videocopyright storage module 44 is queried according to the video label obtained by matching the audio fingerprint data, to obtain a picture collection corresponding to the video label. A picture (that is, the first picture) satisfying the time parameter obtained by matching the audio fingerprint data is read from the picture collection. Correspondingly, for the first video data, a picture (that is, the second picture) that satisfies the time parameter obtained by matching the audio fingerprint data is extracted from the first video data. - In this embodiment, the first feature parameter of the first picture and the second feature parameter of the second picture are extracted. The first feature parameter is compared with the second feature parameter to determine whether the first video data is consistent with video data in a copyright database. The
video processing module 43 is configured to extract the first feature parameter of the first picture and the second feature parameter of the second picture by at least one of the following methods: a Scale-Invariant Feature Transform (SIFT) method and a Histogram of Oriented Gradient (HOG) method. Further, thevideo processing module 43 is configured to calculate similarity between the first feature parameter and the second feature parameter by using the SIFT method or the HOG method; and determine that the first video data is consistent with the video in the video copyright library if the similarity reaches a preset threshold. - According to the invention, before the video
copyright storage module 44 is queried using the video label, in other words, in a process of establishing the videocopyright storage module 44, thevideo processing module 43 is further configured to: extracting a picture from second video data having copyright, and store the extracted picture, a time parameter corresponding to the picture, and a video label corresponding to the picture in the videocopyright storage module 44. - The
video processing module 43 is configured to: perform scene recognition on the second video data, recognize and filter out a first picture collection representing scene switching in the second video data, to obtain a second picture collection; analyze a picture in the second picture collection, to obtain edge feature information of the picture; and extract a picture of which a quantity of the edge feature information reaches a preset threshold. - In the process of establishing the video
copyright storage module 44, key frame picture extraction is performed on all video data having copyright. As shown inFIG. 4 , the video data having copyright is input and thevideo processing module 43 extracts a key frame picture of the input video data. A process of extracting the key frame picture of the video data includes: first, performing scene switching detection on the video data. The scene switching detection may be performed by detecting a foreground and/or a background in the picture. If it is detected that the foregrounds and/or backgrounds of two pictures are inconsistent, it may be determined that a time point between time points corresponding to the two pictures is a scene switching time point. It may also be understood as that a first picture having an early time is the last picture of the previous scene, and the other picture having a later time is the first picture of a next scene. In this embodiment, scene recognition is performed on the input video data, and a first picture collection representing scene switching is recognized. In the process of extracting the key frame picture, first, the first picture collection is filtered out from a picture collection included in the video data, to prevent that an extracted key frame picture is at a scene switching position which may reduce accuracy of subsequent picture content matching. Further, thevideo processing module 43 further analyzes remaining pictures in the remaining second picture collection. The complexity of the pictures is analyzed, to extract a picture with high complexity as a key frame picture. In an implementation, picture edge feature information may be analyzed in this embodiment of the present disclosure to find out, in the second picture collection, a picture that has a large quantity of the edge feature information as the key frame picture. This is because a larger quantity of the edge feature information of a picture indicates higher complexity of the picture. In a process of matching two pictures, more complex content leads to higher matching accuracy. - In an implementation, the
video processing module 43 is configured to: generate a third picture collection of pictures each of which the quantity of the edge feature information reaches the preset threshold; and extract pictures from the third picture collection at a preset time interval. - In this embodiment, there may be many pictures of which the quantity of the edge feature information is large. To reduce picture storage space and a calculation amount of feature matching, the
video processing module 43 may extract a picture as a key frame picture at the preset time interval. For example, one picture is extracted every K second. This may greatly reduce a quantity of the extracted key frame pictures, greatly reduce a quantity of stored pictures and the calculation amount of feature matching, and reduce consumed calculation resources of a server. - In this embodiment, in an actual application, the
audio processing module 41 and thevideo processing module 43 may be implemented by a central processing unit (CPU), a digital signal processor (DSP), a microcontroller unit (MCU), or a field-programmable gate array (FPGA). In an actual application, the audiofingerprint storage module 42 and the videocopyright storage module 44 may be implemented by a memory in the server. - In the technical solution of this embodiment of the present disclosure, by combining audio content detection and video content detection, i.e., by primarily using audio content detection and secondarily using video content detection, as compared with the case that only video content detection is used, required calculation capabilities are greatly decreased, calculation resource consumption is reduced, and a shortcoming of copyright protection merely using an audio fingerprint is overcome, so as to meet a video copyright protection requirement.
- An example of hardware of the server in this embodiment is shown in
FIG. 8 . The apparatus includes aprocessor 61, astorage medium 62, and at least oneexternal communications interface 63. Theprocessor 61, thestorage medium 62, and theexternal communications interface 63 are connected to each other by using abus 64. - It needs to be noted that, the foregoing descriptions related to the server are similar to the foregoing method descriptions. Descriptions of a beneficial effect of the server are similar to that of the method, which are not repeated here. For technical details that are not disclosed in the server embodiments of the present disclosure, one may refer to the method embodiments of the present disclosure.
- A person skilled in the art should understand that the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, the present disclosure may be implemented in a form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may be implemented in a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a magnetic disk memory, an optical memory, and the like) that include computer-usable program code.
- The present disclosure is described with reference to flowcharts and/or block diagrams of the method, the device (the system), and the computer program product in the embodiments of the present disclosure. It should be understood that computer program instructions may be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams. These computer program instructions may be provided for a general-purpose computer, a dedicated computer, an embedded processor, or a processor of any other programmable data processing device to generate a machine, so that the instructions executed by a computer or a processor of any other programmable data processing device generate an apparatus for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
- These computer program instructions may also be stored in a computer-readable memory that can instruct the computer or any other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
- These computer program instructions may also be loaded onto a computer or another programmable data processing device, so that a series of operations and steps are performed on the computer or the another programmable device, thereby generating computer-implemented processing. Therefore, the instructions executed on the computer or the another programmable device provide steps for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
- The foregoing descriptions are merely preferred embodiments of the present disclosure, but are not used to limit the protection scope of the present invention, the scope of protection being defined by the appended claims.
Claims (8)
- A video copyright detection method, comprising:obtaining (101), by a server, first video data to be detected, and decoding, by the server, the first video data, to obtain audio data of the first video data;analyzing (102), by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data;querying (103), by the server based on the audio fingerprint data, an audio fingerprint library;obtaining (104), by the server, a video label and a time parameter corresponding to the audio fingerprint data if the audio fingerprint library comprises the audio fingerprint data;querying (105), by the server, a video copyright library using the video label, to obtain a first picture that is corresponding to the video label and that satisfies the time parameter, and extracting from the first video data a second picture that satisfies the time parameter;extracting (105), by the server, a first feature parameter of the first picture and a second feature parameter of the second picture; andcomparing (106), by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library,wherein before the querying, by the server, a video copyright library using the video label, the method comprises:extracting, by the server, pictures from second video data having copyright, and, for each of the extracted pictures, storing the extracted picture, a time parameter corresponding to the picture, and a video label corresponding to the picture in the video copyright library,wherein the extracting, by the server, pictures from second video data having copyright comprises:performing, by the server, scene recognition on the second video data, and recognizing and filtering out a first picture collection representing scene switching in the second video data, to obtain a second picture collection;analyzing, by the server, pictures in the second picture collection, to obtain edge feature information of the pictures in the second picture collection; andextracting, by the server, pictures of which a quantity of the edge feature information reaches a preset threshold.
- The method according to claim 1, wherein the extracting, by the server, pictures of which a quantity of the edge feature information reaches a preset threshold comprises:generating, by the server, a third picture collection of pictures each of which the quantity of the edge feature information reaches the preset threshold; andextracting, by the server, pictures from the third picture collection at a preset time interval.
- The method according to claim 1, wherein the analyzing, by the server, the audio data, to obtain audio fingerprint data corresponding to the audio data comprises:
extracting, by the server, a feature parameter of the audio data, and obtaining, based on the feature parameter, the audio fingerprint data corresponding to the audio data. - The method according to claim 1, wherein the extracting, by the server, a first feature parameter of the first picture and a second feature parameter of the second picture comprises: extracting, by the server, the first feature parameter of the first picture and the second feature parameter of the second picture by at least one of the following methods:
a scale-invariant feature transform SIFT method and a histogram of oriented gradient HOG method. - The method according to claim 1, wherein the comparing, by the server, the first feature parameter with the second feature parameter, and determining, based on a comparison result, whether the first video data is consistent with a video in the video copyright library comprises:calculating, by the server, similarity between the first feature parameter and the second feature parameter; anddetermining that the first video data is consistent with the video in the video copyright library if the similarity reaches a preset threshold.
- The method according to claim 1, wherein the audio fingerprint data is quantized data representing a feature parameter of the audio data.
- A server, comprising at least one processor and a memory storing a processor-executable instruction, the instruction, when executed by the at least one processor, causing the server to perform the method according to any one of claims 1 to 6.
- A non-volatile storage medium, storing one or more computer programs, the computer programs comprising an instruction that is executable by a possessor comprising one or more memories, the instruction, when executed by a computer, causing the computer to perform the method according to any one of claims 1 to 6.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610457780.1A CN106126617B (en) | 2016-06-22 | 2016-06-22 | A kind of video detecting method and server |
PCT/CN2017/088240 WO2017219900A1 (en) | 2016-06-22 | 2017-06-14 | Video detection method, server and storage medium |
Publications (3)
Publication Number | Publication Date |
---|---|
EP3477506A1 EP3477506A1 (en) | 2019-05-01 |
EP3477506A4 EP3477506A4 (en) | 2020-02-26 |
EP3477506B1 true EP3477506B1 (en) | 2024-05-01 |
Family
ID=57268521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP17814638.7A Active EP3477506B1 (en) | 2016-06-22 | 2017-06-14 | Video detection method, server and storage medium |
Country Status (4)
Country | Link |
---|---|
US (1) | US11132555B2 (en) |
EP (1) | EP3477506B1 (en) |
CN (1) | CN106126617B (en) |
WO (1) | WO2017219900A1 (en) |
Families Citing this family (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126617B (en) * | 2016-06-22 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of video detecting method and server |
US20170371963A1 (en) | 2016-06-27 | 2017-12-28 | Facebook, Inc. | Systems and methods for identifying matching content |
CN107257338B (en) * | 2017-06-16 | 2018-09-28 | 腾讯科技(深圳)有限公司 | media data processing method, device and storage medium |
CN107633078B (en) * | 2017-09-25 | 2019-02-22 | 北京达佳互联信息技术有限公司 | Audio-frequency fingerprint extracting method, audio-video detection method, device and terminal |
CN107832384A (en) * | 2017-10-28 | 2018-03-23 | 北京安妮全版权科技发展有限公司 | Infringement detection method, device, storage medium and electronic equipment |
CN110889010A (en) * | 2018-09-10 | 2020-03-17 | 杭州网易云音乐科技有限公司 | Audio matching method, device, medium and electronic equipment |
CN111402935B (en) * | 2019-01-03 | 2022-09-13 | 北京图音数码科技有限公司 | Method for playing audio and video data |
CN111694970A (en) * | 2019-03-13 | 2020-09-22 | 阿里巴巴集团控股有限公司 | Data processing method, device and system |
CN110110500B (en) * | 2019-06-04 | 2023-04-07 | 施建锋 | Decentralized image copyright protection system and method with immediate infringement detection function |
KR20200142787A (en) * | 2019-06-13 | 2020-12-23 | 네이버 주식회사 | Electronic apparatus for recognition multimedia signal and operating method of the same |
CN110275988A (en) * | 2019-06-14 | 2019-09-24 | 秒针信息技术有限公司 | Obtain the method and device of picture |
CN110390352A (en) * | 2019-06-26 | 2019-10-29 | 华中科技大学 | A kind of dark data value appraisal procedure of image based on similitude Hash |
CN110620905A (en) * | 2019-09-06 | 2019-12-27 | 平安医疗健康管理股份有限公司 | Video monitoring method and device, computer equipment and storage medium |
CN111046345A (en) * | 2019-10-23 | 2020-04-21 | 上海突进网络科技有限公司 | Picture verification and anti-theft method and system |
CN111177466B (en) * | 2019-12-23 | 2024-03-26 | 联想(北京)有限公司 | Clustering method and device |
CN111241928B (en) * | 2019-12-30 | 2024-02-06 | 新大陆数字技术股份有限公司 | Face recognition base optimization method, system, equipment and readable storage medium |
CN111325144A (en) * | 2020-02-19 | 2020-06-23 | 上海眼控科技股份有限公司 | Behavior detection method and apparatus, computer device and computer-readable storage medium |
CN111753735B (en) * | 2020-06-24 | 2023-06-06 | 北京奇艺世纪科技有限公司 | Video clip detection method and device, electronic equipment and storage medium |
CN112104892B (en) * | 2020-09-11 | 2021-12-10 | 腾讯科技(深圳)有限公司 | Multimedia information processing method and device, electronic equipment and storage medium |
CN112215812B (en) * | 2020-09-30 | 2023-12-19 | 大方众智创意广告(珠海)有限公司 | Image detection method, device, electronic equipment and readable storage medium |
CN112633204A (en) * | 2020-12-29 | 2021-04-09 | 厦门瑞为信息技术有限公司 | Accurate passenger flow statistical method, device, equipment and medium |
CN112866800A (en) * | 2020-12-31 | 2021-05-28 | 四川金熊猫新媒体有限公司 | Video content similarity detection method, device, equipment and storage medium |
CN113190404B (en) * | 2021-04-23 | 2023-01-03 | Oppo广东移动通信有限公司 | Scene recognition method and device, electronic equipment and computer-readable storage medium |
CN113254706A (en) * | 2021-05-12 | 2021-08-13 | 北京百度网讯科技有限公司 | Video matching method, video processing device, electronic equipment and medium |
CN113360709B (en) * | 2021-05-28 | 2023-02-17 | 维沃移动通信(杭州)有限公司 | Method and device for detecting short video infringement risk and electronic equipment |
CN113569719B (en) * | 2021-07-26 | 2023-12-29 | 上海艾策通讯科技股份有限公司 | Video infringement judging method and device, storage medium and electronic equipment |
CN114363672A (en) * | 2021-12-17 | 2022-04-15 | 北京快乐茄信息技术有限公司 | Similar video determination method, device, terminal and storage medium |
CN114218429B (en) * | 2021-12-17 | 2022-11-15 | 天翼爱音乐文化科技有限公司 | Video color ring setting method, system, device and storage medium |
CN114928764A (en) * | 2022-04-12 | 2022-08-19 | 广州阿凡提电子科技有限公司 | Original short video AI intelligent detection method, system and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150365722A1 (en) * | 2014-06-12 | 2015-12-17 | Google Inc. | Systems and Methods for Locally Detecting Consumed Video Content |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102890778A (en) * | 2011-07-21 | 2013-01-23 | 北京新岸线网络技术有限公司 | Content-based video detection method and device |
US8805827B2 (en) * | 2011-08-23 | 2014-08-12 | Dialogic (Us) Inc. | Content identification using fingerprint matching |
US8717499B2 (en) * | 2011-09-02 | 2014-05-06 | Dialogic Corporation | Audio video offset detector |
CN103051925A (en) * | 2012-12-31 | 2013-04-17 | 传聚互动(北京)科技有限公司 | Fast video detection method and device based on video fingerprints |
US20140199050A1 (en) * | 2013-01-17 | 2014-07-17 | Spherical, Inc. | Systems and methods for compiling and storing video with static panoramic background |
US11523090B2 (en) * | 2015-03-23 | 2022-12-06 | The Chamberlain Group Llc | Motion data extraction and vectorization |
CN105554570B (en) * | 2015-12-31 | 2019-04-12 | 北京奇艺世纪科技有限公司 | A kind of copyright video monitoring method and device |
CN106126617B (en) * | 2016-06-22 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of video detecting method and server |
-
2016
- 2016-06-22 CN CN201610457780.1A patent/CN106126617B/en active Active
-
2017
- 2017-06-14 EP EP17814638.7A patent/EP3477506B1/en active Active
- 2017-06-14 WO PCT/CN2017/088240 patent/WO2017219900A1/en unknown
-
2018
- 2018-11-13 US US16/190,035 patent/US11132555B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150365722A1 (en) * | 2014-06-12 | 2015-12-17 | Google Inc. | Systems and Methods for Locally Detecting Consumed Video Content |
Also Published As
Publication number | Publication date |
---|---|
EP3477506A4 (en) | 2020-02-26 |
US20190080177A1 (en) | 2019-03-14 |
CN106126617A (en) | 2016-11-16 |
CN106126617B (en) | 2018-11-23 |
WO2017219900A1 (en) | 2017-12-28 |
US11132555B2 (en) | 2021-09-28 |
EP3477506A1 (en) | 2019-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3477506B1 (en) | Video detection method, server and storage medium | |
EP2641401B1 (en) | Method and system for video summarization | |
US10438050B2 (en) | Image analysis device, image analysis system, and image analysis method | |
US10108709B1 (en) | Systems and methods for queryable graph representations of videos | |
US8477836B2 (en) | System and method for comparing an input digital video to digital videos using extracted and candidate video features | |
US20140245463A1 (en) | System and method for accessing multimedia content | |
WO2012141655A1 (en) | In-video product annotation with web information mining | |
US8175392B2 (en) | Time segment representative feature vector generation device | |
Chamasemani et al. | Video abstraction using density-based clustering algorithm | |
Ji et al. | News videos anchor person detection by shot clustering | |
Souza et al. | A unified approach to content-based indexing and retrieval of digital videos from television archives. | |
Ciaparrone et al. | A comparison of deep learning models for end-to-end face-based video retrieval in unconstrained videos | |
Chivadshetti et al. | Content based video retrieval using integrated feature extraction and personalization of results | |
Roopalakshmi et al. | A novel approach to video copy detection using audio fingerprints and PCA | |
Chou et al. | Multimodal video-to-near-scene annotation | |
JP2011248671A (en) | Image retrieval device, program, and method for retrieving image among multiple reference images using image for retrieval key | |
Tseytlin et al. | Content based video retrieval system for distorted video queries | |
Guru et al. | Histogram based split and merge framework for shot boundary detection | |
Mishra et al. | Parameter free clustering approach for event summarization in videos | |
Chamasemani et al. | A study on surveillance video abstraction techniques | |
CN111291224A (en) | Video stream data processing method, device, server and storage medium | |
Liu et al. | Book page identification using convolutional neural networks trained by task-unrelated dataset | |
Sa et al. | Automatic video shot boundary detection using k-means clustering and improved adaptive dual threshold comparison | |
Rodrıguez | Codebook-Based Near-Duplicate Video Detection | |
Sharma et al. | Color difference histogram for feature extraction in video retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20190122 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20200123 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 16/00 20190101ALI20200117BHEP Ipc: G10L 25/54 20130101ALI20200117BHEP Ipc: G06K 9/00 20060101ALI20200117BHEP Ipc: G06K 9/46 20060101AFI20200117BHEP Ipc: G06K 9/62 20060101ALI20200117BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20210520 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602017081613 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G06F0017300000 Ipc: G06V0020400000 Ref legal event code: R079 Ipc: G06V0020400000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 25/54 20130101ALN20231222BHEP Ipc: G06V 10/74 20220101ALI20231222BHEP Ipc: G06V 10/46 20220101ALI20231222BHEP Ipc: G06V 10/44 20220101ALI20231222BHEP Ipc: G06F 18/22 20230101ALI20231222BHEP Ipc: G06V 20/40 20220101AFI20231222BHEP |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 25/54 20130101ALN20240108BHEP Ipc: G06V 10/74 20220101ALI20240108BHEP Ipc: G06V 10/46 20220101ALI20240108BHEP Ipc: G06V 10/44 20220101ALI20240108BHEP Ipc: G06F 18/22 20230101ALI20240108BHEP Ipc: G06V 20/40 20220101AFI20240108BHEP |
|
INTG | Intention to grant announced |
Effective date: 20240122 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |