CN106601243B - Video file identification method and device - Google Patents

Video file identification method and device Download PDF

Info

Publication number
CN106601243B
CN106601243B CN201510683009.1A CN201510683009A CN106601243B CN 106601243 B CN106601243 B CN 106601243B CN 201510683009 A CN201510683009 A CN 201510683009A CN 106601243 B CN106601243 B CN 106601243B
Authority
CN
China
Prior art keywords
audio
matching
video
video file
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510683009.1A
Other languages
Chinese (zh)
Other versions
CN106601243A (en
Inventor
谷长信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510683009.1A priority Critical patent/CN106601243B/en
Priority to PCT/CN2016/101733 priority patent/WO2017067400A1/en
Publication of CN106601243A publication Critical patent/CN106601243A/en
Application granted granted Critical
Publication of CN106601243B publication Critical patent/CN106601243B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The invention discloses a video file identification method and a video file identification device, wherein the method comprises the steps of firstly obtaining audio information from a video file to be identified, extracting audio fingerprints by segmenting the audio information, and performing audio matching with a training sample to judge whether the video file is a target video; then, the suspicious video files which cannot be confirmed are continuously identified through image matching. The device comprises an audio preprocessing module, an audio fingerprint matching module, an audio judging module, an image preprocessing module and a comprehensive judging module. The method and the device have high processing efficiency and high recognition rate.

Description

Video file identification method and device
Technical Field
The invention belongs to the technical field of computer data processing, and particularly relates to a video file identification method and device.
Background
With the popularity of the internet, more and more users are beginning to store personal video files using cloud servers provided by internet service providers, some of which also allow users to upload video files for sharing to other users in the network. However, the law has strict examination requirements on video files transmitted on the internet and cannot relate to yellow storm. Therefore, the internet service provider has responsibility and obligation to check and supervise the video files uploaded by the users and provided by the service provider according to the national standard.
In the prior art, the auditing of the video file is based on the video image, and the auditing is performed by capturing the picture frame in the video image, so that the following problems exist:
the treatment efficiency is low: the frame capturing range of the video image cannot be effectively positioned, if comprehensive audit is wanted, the frame capturing amount is extremely large, and the processing efficiency is low;
the identification means is single, and the identification rate is not high: by means of picture identification alone, the probability of missing identification and error identification is high.
Disclosure of Invention
The invention aims to provide a video file identification method and a video file identification device, which are used for further carrying out picture identification by means of audio fingerprint identification and a video image frame capturing technology, finally giving an identification result and effectively improving the processing efficiency.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a video file identification method is used for auditing a video file to be identified, and comprises the following steps:
acquiring audio information from a video file to be identified;
segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segment to obtain an audio fingerprint of the audio segment;
carrying out audio matching on the audio fingerprints of the obtained audio segments and the trained training samples, and recording an audio matching result;
judging whether the video file to be identified is a target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and entering the next step for continuing identification when the video file to be identified is judged to be a suspicious video file;
according to the audio matching result, starting to capture frames of the video file from the starting time of the successfully matched audio segment, capturing video images, performing image matching on the captured video images, and recording the image matching result;
and judging whether the video file to be identified is the target video or not according to the image matching result or the image matching result and the audio matching result.
The invention discloses an implementation mode for segmenting acquired audio information, which comprises the following steps:
finding out all volume peak points exceeding a specified threshold value from the audio information in a time domain;
and sampling the audio segments according to the fixed time length from each peak point in sequence to obtain each audio segment.
Another implementation manner of segmenting the acquired audio information according to the present invention includes:
and sampling the audio information according to a fixed time length to obtain each audio segment.
Further, the audio matching result includes: the number of times of successful matching, the starting time of the successfully matched audio segments and the labeling information of the training samples matched with the successfully matched audio segments; the labeling information includes: sample duration, content rating, and manual classification labels.
Further, the determining whether the video file to be identified is the target video according to the audio matching result includes:
when the matching success frequency is larger than a first threshold value, judging that the video file to be identified is a target video;
when the matching success frequency is smaller than a second threshold value, judging that the video file to be identified is not the target video;
and when the matching times are between the first threshold and the second threshold, calculating the audio matching probability corresponding to the matching result, and when the calculated matching probability is greater than a set third threshold, judging that the video file to be identified is a target video, otherwise, regarding the video file to be identified as a suspicious video file.
Wherein, the calculating the audio matching probability corresponding to the matching result of this time includes:
according to the number X of successful matching times and the total number Z of all the audio segments, calculating the ratio P1 of the two as:
Figure BDA0000825851670000031
and calculating the audio matching probability R1 corresponding to the matching result at this time, wherein the calculation formula is as follows:
R1=P1*P(Y)
wherein, R1 is the audio matching probability corresponding to the current matching result, and p (y) is the sum of the weights corresponding to the content levels of all the training samples matching the audio fingerprint of the audio segment.
Further, the determining whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result includes:
calculating the image matching probability R according to the image matching result2,R2The ratio of the number of successful matching times of the captured video images to the total number of all captured video images;
according to the video matching probability R2And the audio matching probability R1ComputingIf the comprehensive matching probability R' of the matching exceeds a fourth threshold value, the video file to be identified is judged to be a target video, and if not, the video file to be identified is judged to be a normal video;
wherein, the calculation formula of the comprehensive matching probability R' is as follows:
R′=R1*α+R2
where α and β are the weights of the audio matching probability and the video matching probability, respectively.
The invention also provides a video file identification device, which is used for checking the video file to be identified, and the device comprises:
the audio preprocessing module is used for acquiring audio information from a video file to be identified, segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segment to obtain an audio fingerprint of the audio segment;
the audio fingerprint matching module is used for performing audio matching on the obtained audio fingerprints of the audio segments and the trained training samples and recording audio matching results;
the audio judging module is used for judging whether the video file to be identified is a target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and continuing processing by the image preprocessing module when the video file to be identified is judged to be a suspicious video file;
the image preprocessing module is used for capturing frames of the video file from the starting time of the successfully matched audio segment according to the audio matching result and capturing video images;
the image matching module is used for carrying out image matching on the captured video images and recording image matching results;
and the comprehensive judgment module is used for judging whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result.
The invention provides a video file identification method and device, which are characterized in that voice of a video file is quickly identified by means of audio fingerprint identification, a matched starting time point is recorded, then frame grabbing is carried out at intervals within the range of the starting time point for further picture identification, and finally an identification result is given. The method has the characteristics of high processing efficiency and high recognition rate.
Drawings
FIG. 1 is a flow chart of a video file identification method of the present invention;
fig. 2 is a schematic structural diagram of a video file recognition apparatus according to the present invention.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the drawings and examples, which should not be construed as limiting the present invention.
The current popular formats of video files include AVI format, MOV format, MPEG format, RM format, ASF format, etc., and a complete video file includes both video image and audio information. The general idea of the invention is to extract audio information from a video file, identify the extracted audio information, then capture frames of video images according to the identification result, and further identify the captured video images.
The following description will be given by taking the identification of videos related to yellow storm as an example, and the same is true for other types of video files. As shown in fig. 1, a video file identification method includes the following steps:
and step S1, acquiring audio information from the video file to be identified.
The embodiment acquires the audio information from the video file to be identified, and can directly decode the video file and extract the audio information. The audio information can also be extracted directly through other third-party software. The extraction of audio information is a mature technology, and is not described herein again.
And step S2, segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segments to obtain audio fingerprints of the audio segments.
And segmenting the acquired audio information, and performing fingerprint extraction on each audio segment to obtain an audio fingerprint corresponding to each audio segment.
The Audio information identification of the invention is based on Audio fingerprint (Audio fingerprint technology), the Audio fingerprint refers to a compact digital signature based on content and capable of representing a section of important acoustic characteristics of sound, the main purpose of the invention is to establish an effective mechanism to compare the perceptual auditory quality of two Audio files, and the invention can be used in applications such as Audio identification and content integrity verification.
After the audio information is stripped from the video file, the total duration T (milliseconds) of the playing of the audio information and the total length l (bytes) of the extracted audio information can be obtained. And then segmenting the audio information into a plurality of audio segments, extracting the fingerprint of each audio segment, and comparing the extracted audio fingerprint with the training sample. The training samples are also obtained by performing audio segmentation according to the same method and training.
The following two embodiments illustrate specific audio information segmentation methods:
the method comprises the following steps: and segmenting according to the volume in the time domain.
The audio information has different volume along the time axis in the time domain, the audio information is represented as a waveform with fluctuation, a threshold value of the volume is set, all volume peak points exceeding the specified threshold value can be found out from the audio information in the time domain, and the peak value is marked as (k)1,k2,k3,....,kn) And recording the coordinates on the time axis corresponding to each peak point, wherein the coordinates on the time axis are the time offset p of the peak point in the audio information.
And then sampling is carried out from each peak point in sequence according to the fixed time length w to obtain audio segments, audio fingerprints are extracted, and n audio fingerprints are extracted so as to be compared with the training samples.
It is easy to understand that the starting point of each audio segment is the time corresponding to the peak point, and the starting point of the audio segment corresponding to the peak point can be calculated as: t (p/L).
The second method comprises the following steps: and cutting at fixed intervals.
Sampling the audio information according to a fixed time length w to obtain f1,f2,f3,….,fmAn audio segment, and an audio fingerprint is extracted,for comparison with training samples.
It will be readily appreciated that the start of each audio segment may be calculated according to a fixed duration, the time starting points of the audio segments being: t (f)i-1)/L, wherein i belongs to (1-m).
It is easily understood that the fixed duration w coincides with the duration of the training samples in the training sample library, such as 1 second. Corresponding to a video file related to a yellow storm, a video image corresponding to a higher volume is often an object needing important attention, so preferably, the video file is more easily and quickly identified by adopting the method I, peak points are sorted according to the volume, and audio with high peaks is segmented firstly.
Specifically, the fingerprint extraction is performed on the audio segment, and an algorithm for extracting, for example, a fast fourier transform method, is not described herein again. And acquiring the audio fingerprint corresponding to the audio segment so as to compare the subsequent steps with the trained training sample.
And step S3, performing audio matching on the obtained audio fingerprints of the audio segments and the trained training samples, and recording audio matching results.
In the embodiment, training samples are obtained by training a large number of various types of video and audio related to yellow storm, and labeling information is added to each training sample, wherein the labeling information of the training samples mainly comprises sample duration, content grades, manual classification labels and the like, and the content grades are the levels related to yellow storm in the embodiment.
And performing audio matching on the audio fingerprints of the audio segments and the training samples, and if the recognition similarity between the audio fingerprints of the audio segments and the training samples is greater than a set audio similarity threshold, determining that the matching is successful. And traversing all audio segments, and recording audio matching results, wherein the audio matching results comprise: the number of times of successful matching, the starting time of the audio segment successfully matched, and the labeling information of the training sample matched with the audio segment successfully matched.
And step S4, judging whether the video file to be identified is the target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and entering the next step for continuing identification when the video file to be identified is judged to be the suspicious video file.
Specifically, the embodiment determines whether the video file to be identified is the target video by the following steps:
when the matching success times are larger than a first threshold (for example, 20 times), judging that the video file to be identified is the target video, and terminating the identification;
when the matching success frequency is smaller than a second threshold (for example, 2 times), judging that the video file to be identified is not the target video, and terminating the identification;
and when the matching times are between the first threshold and the second threshold, calculating the audio matching probability corresponding to the current matching result, and when the calculated matching probability is greater than a set third threshold (for example, T is a specific numerical value), judging that the video file to be identified is the target video, otherwise, regarding the video file to be identified as a suspicious video file, and continuing to identify the suspicious video file.
Assuming that the number of successful matching times is X and the total number of audio segments to be matched is Z, the ratio P of the number of successful matching times to the total number of all audio segments1Comprises the following steps:
Figure BDA0000825851670000061
this embodiment calculates the audio matching probability R corresponding to the current matching result1The calculation formula is as follows:
R1=P1*P(Y)
wherein R is1The audio matching probability, P, corresponding to the matching result1P (y) is the sum of the weights corresponding to the content ratings of all training samples matching the audio fingerprint of the audio segment, as the ratio of the number of successful matches to the total number of audio segments.
Specifically, for an audio segment, the matched training samples correspond to a yellow-related storm level YiThen its corresponding weight is P (Y)i) And has P (Y) ═ Sigma P (Y)i)。
Obtain the book through calculationAudio matching probability R corresponding to secondary matching result1Then, the audio is matched with the probability R1And comparing the video image with a set third threshold value, judging as a target video if the video image is higher than the third threshold value, and otherwise, further judging the video image.
The above-mentioned determination steps are only a specific embodiment, wherein the first threshold, the second threshold, and the third threshold may be adjusted to make the determination result more accurate. An intermediate threshold value can be further set between the first threshold value and the second threshold value, for example, 10 times, when the number of times of successful matching is greater than the intermediate threshold value, the audio matching probability corresponding to the matching result of this time is calculated, and the judgment is performed according to the calculated audio matching probability; if the matching success frequency is smaller than the intermediate threshold and larger than the second threshold, the audio matching probability corresponding to the matching result is not calculated, and the next step is directly carried out, so that the video image needs to be further judged. The present invention is not limited to the specific determination steps, and will not be described in detail below.
And step S5, according to the matching result of the audio segmentation, starting to capture frames of the video file from the starting time of the audio segmentation successfully matched, capturing video images, performing image matching on the captured video images, and recording the image matching result.
Through the matching in step S3, it is known which audio segments are successfully matched, and the start time of the successfully matched audio segment in the recorded matching result is located to the corresponding time point in the video file, and the video file is captured from the time point, and the time interval of capturing frames can be determined according to the actual situation, so as to capture the video image.
The captured video images are identified, in this embodiment, whether the captured video images are images related to yellow storm or not is identified, and the captured video images can be identified by human eyes or a computer. If the identification is carried out by the computer, a large number of various yellow-related and storm-related video images also need to be trained to obtain training samples, the captured video images are matched with the training samples to obtain the identification similarity of the video images, if the identification similarity is larger than a set image similarity threshold value, the matching is regarded as successful, and the image matching result, namely the number of times of successful image matching is recorded.
And step S6, judging whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result.
After the image matching is finished, the video matching probability R can be calculated according to the successful matching times2,R2The ratio of the number of successful matching for the captured video image to the total number of all captured video images.
According to the video matching probability R2And the audio matching probability R1And calculating the comprehensive matching probability R' of the matching, if the comprehensive matching probability exceeds a fourth threshold value, judging the video file to be identified as the target video, and otherwise, judging the video file to be identified as the normal video.
The calculation formula of the comprehensive matching probability R' is as follows:
R′=R1*α+R2
where α and β are the weights of the audio matching probability and the video matching probability, respectively.
And judging according to the obtained comprehensive matching probability, if the comprehensive matching probability exceeds an identification threshold, judging that the video file to be identified is a target video, and otherwise, judging that the video file to be identified is a normal video.
Or judging whether the video file to be identified is a video file related to yellow storm or not directly according to the number of successful image matching, or according to the video matching probability R2To determine whether the video file to be identified is a video file related to yellow storm, such as the number of successful image matching or the video matching probability R2And if the video file is larger than the set threshold value, the video file is judged to be a video file related to yellow storm. The present invention is not limited to specific determination conditions.
It should be noted that matching the audio fingerprints of the audio segments with the training samples to calculate their identification similarities, or matching the video images with the training samples to calculate their identification similarities are all currently mature technologies, and for example, the calculation may be performed by a maximum likelihood estimation method, which is not described herein again.
Fig. 2 shows a video file identification apparatus corresponding to the above method, comprising:
the audio preprocessing module is used for acquiring audio information from a video file to be identified, segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segment to obtain an audio fingerprint of the audio segment;
the audio fingerprint matching module is used for performing audio matching on the obtained audio fingerprints of the audio segments and the trained training samples and recording audio matching results;
the audio judging module is used for judging whether the video file to be identified is a target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and continuing processing by the image preprocessing module when the video file to be identified is judged to be a suspicious video file;
the image preprocessing module is used for capturing frames of the video file from the starting time of the successfully matched audio segment according to the audio matching result and capturing video images;
the image matching module is used for carrying out image matching on the captured video images and recording image matching results;
and the comprehensive judgment module is used for judging whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result.
The audio preprocessing module segments the acquired audio information, and may segment the audio information according to the volume level in the time domain or according to a fixed interval, which corresponds to the specific audio segmentation method in the method and is not described herein again.
Similarly, the operations performed by the audio determining module and the comprehensive determining module in the specific determination correspond to the specific steps of step S4 and step S6, and are not described herein again.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art can make various corresponding changes and modifications according to the present invention without departing from the spirit and the essence of the present invention, but these corresponding changes and modifications should fall within the protection scope of the appended claims.

Claims (10)

1. A video file identification method is used for auditing a video file to be identified, and is characterized by comprising the following steps:
acquiring audio information from a video file to be identified;
segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segment to obtain an audio fingerprint of the audio segment;
carrying out audio matching on the audio fingerprints of the obtained audio segments and the trained training samples, and recording an audio matching result;
judging whether the video file to be identified is a target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and entering the next step for continuing identification when the video file to be identified is judged to be a suspicious video file;
according to the audio matching result, starting to capture frames of the video file from the starting time of the successfully matched audio segment, capturing video images, performing image matching on the captured video images, and recording the image matching result;
judging whether the video file to be identified is a target video or not according to the image matching result or the image matching result and the audio matching result;
wherein the audio matching result comprises: the number of times of successful matching, the starting time of the successfully matched audio segments and the labeling information of the training samples matched with the successfully matched audio segments;
the labeling information includes: sample duration, content level and manual classification label;
the judging whether the video file to be identified is the target video according to the audio matching result comprises the following steps:
when the matching success frequency is larger than a first threshold value, judging that the video file to be identified is a target video;
when the matching success frequency is smaller than a second threshold value, judging that the video file to be identified is not the target video;
and when the matching times are between the first threshold and the second threshold, calculating the audio matching probability corresponding to the matching result, and when the calculated matching probability is greater than a set third threshold, judging that the video file to be identified is a target video, otherwise, regarding the video file to be identified as a suspicious video file.
2. The method for identifying a video file according to claim 1, wherein said segmenting the acquired audio information comprises:
finding out all volume peak points exceeding a specified threshold value from the audio information in a time domain;
and sampling the audio segments according to the fixed time length from each peak point in sequence to obtain each audio segment.
3. The method for identifying a video file according to claim 1, wherein said segmenting the acquired audio information comprises:
and sampling the audio information according to a fixed time length to obtain each audio segment.
4. The method of claim 1, wherein the calculating the audio matching probability corresponding to the current matching result comprises:
calculating the ratio P of the number X of successful matching times to the total number Z of all audio segments1Comprises the following steps:
Figure FDA0002582993530000021
calculating the audio matching probability R corresponding to the matching result1The calculation formula is as follows:
R1=P1*P(Y)
wherein R is1The audio matching probability corresponding to the matching result of this time, P (Y) is the sum of the weights corresponding to the content grades of all the training samples matched with the audio fingerprints of the audio segments, and Y is the content grade。
5. The video file identification method according to claim 4, wherein the determining whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result comprises:
calculating the video matching probability R according to the image matching result2,R2The ratio of the number of successful matching times of the captured video images to the total number of all captured video images;
according to the video matching probability R2And the audio matching probability R1Calculating the comprehensive matching probability R' of the matching, if the comprehensive matching probability exceeds a fourth threshold value, judging the video file to be identified as a target video, and otherwise, judging the video file to be identified as a normal video;
wherein, the calculation formula of the comprehensive matching probability R' is as follows:
R′=R1*α+R2
where α and β are the weights of the audio matching probability and the video matching probability, respectively.
6. A video file identification apparatus for auditing video files to be identified, the apparatus comprising:
the audio preprocessing module is used for acquiring audio information from a video file to be identified, segmenting the acquired audio information, and performing fingerprint extraction on the segmented audio segment to obtain an audio fingerprint of the audio segment;
the audio fingerprint matching module is used for performing audio matching on the obtained audio fingerprints of the audio segments and the trained training samples and recording audio matching results;
the audio judging module is used for judging whether the video file to be identified is a target video or not according to the audio matching result, terminating identification when the video file to be identified is judged to be the target video or not, and continuing processing by the image preprocessing module when the video file to be identified is judged to be a suspicious video file;
the image preprocessing module is used for capturing frames of the video file from the starting time of the successfully matched audio segment according to the audio matching result and capturing video images;
the image matching module is used for carrying out image matching on the captured video images and recording image matching results;
the comprehensive judgment module is used for judging whether the video file to be identified is the target video or not according to the image matching result or the image matching result and the audio matching result;
wherein the audio matching result comprises: the number of times of successful matching, the starting time of the successfully matched audio segments and the labeling information of the training samples matched with the successfully matched audio segments; the labeling information includes: sample duration, content level and manual classification label;
the audio judging module judges whether the video file to be identified is a target video according to the audio matching result, and executes the following operations:
when the matching success frequency is larger than a first threshold value, judging that the video file to be identified is a target video;
when the matching success frequency is smaller than a second threshold value, judging that the video file to be identified is not the target video;
and when the matching times are between the first threshold and the second threshold, calculating the audio matching probability corresponding to the matching result, and when the calculated matching probability is greater than a set third threshold, judging that the video file to be identified is a target video, otherwise, regarding the video file to be identified as a suspicious video file.
7. The apparatus according to claim 6, wherein the audio preprocessing module segments the acquired audio information, and specifically performs the following operations:
finding out all volume peak points exceeding a specified threshold value from the audio information in a time domain;
and sampling the audio segments according to the fixed time length from each peak point in sequence to obtain each audio segment.
8. The apparatus according to claim 6, wherein the audio preprocessing module segments the acquired audio information, and specifically performs the following operations:
and sampling the audio information according to a fixed time length to obtain each audio segment.
9. The apparatus according to claim 6, wherein said calculating the audio matching probability corresponding to the current matching result comprises:
calculating the ratio P of the number X of successful matching times to the total number Z of all audio segments1Comprises the following steps:
Figure FDA0002582993530000041
calculating the audio matching probability R corresponding to the matching result1The calculation formula is as follows:
R1=P1*P(Y)
wherein R is1And P (Y) is the audio matching probability corresponding to the matching result at this time, and is the sum of weights corresponding to the content grades of all the training samples matched with the audio fingerprints of the audio segments, wherein Y is the content grade.
10. The video file identification device of claim 9, wherein the comprehensive judgment module judges whether the video file to be identified is the target video according to the image matching result or the image matching result and the audio matching result, and executes the following operations:
calculating the video matching probability R according to the image matching result2,R2The ratio of the number of successful matching times of the captured video images to the total number of all captured video images;
according to the video matching probability R2And the audio matching probability R1Calculating the comprehensive matching probability R' of the matching, if the comprehensive matching probability exceeds a fourth threshold value, judging the video file to be identified as a target video, and otherwise, judging the video file to be identified as a normal video;
wherein, the calculation formula of the comprehensive matching probability R' is as follows:
R′=R1*α+R2
where α and β are the weights of the audio matching probability and the video matching probability, respectively.
CN201510683009.1A 2015-10-20 2015-10-20 Video file identification method and device Active CN106601243B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510683009.1A CN106601243B (en) 2015-10-20 2015-10-20 Video file identification method and device
PCT/CN2016/101733 WO2017067400A1 (en) 2015-10-20 2016-10-11 Video file identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510683009.1A CN106601243B (en) 2015-10-20 2015-10-20 Video file identification method and device

Publications (2)

Publication Number Publication Date
CN106601243A CN106601243A (en) 2017-04-26
CN106601243B true CN106601243B (en) 2020-11-06

Family

ID=58554949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510683009.1A Active CN106601243B (en) 2015-10-20 2015-10-20 Video file identification method and device

Country Status (2)

Country Link
CN (1) CN106601243B (en)
WO (1) WO2017067400A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967922A (en) * 2017-12-19 2018-04-27 成都嗨翻屋文化传播有限公司 A kind of music copyright recognition methods of feature based
CN108419124B (en) * 2018-05-08 2020-11-17 北京酷我科技有限公司 Audio processing method
CN108984665A (en) * 2018-06-29 2018-12-11 杭州当虹科技股份有限公司 A kind of efficient video content combination detection method
CN109389794A (en) * 2018-07-05 2019-02-26 北京中广通业信息科技股份有限公司 A kind of Intellectualized Video Monitoring method and system
CN109271126A (en) * 2018-08-02 2019-01-25 联想(北京)有限公司 A kind of data processing method and device
CN109344289B (en) * 2018-09-21 2020-12-11 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109982137A (en) * 2019-02-22 2019-07-05 北京奇艺世纪科技有限公司 Model generating method, video marker method, apparatus, terminal and storage medium
CN109887493B (en) * 2019-03-13 2021-08-31 安徽声讯信息技术有限公司 Character audio pushing method
CN111489757B (en) * 2020-03-26 2023-08-18 北京达佳互联信息技术有限公司 Audio processing method, device, electronic equipment and readable storage medium
CN113542820B (en) * 2021-06-30 2023-12-22 北京中科模识科技有限公司 Video cataloging method, system, electronic equipment and storage medium
CN114358643B (en) * 2022-01-13 2023-09-12 南京讯思雅信息科技有限公司 Multimedia content wind control management device and management method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470897A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Sensitive film detection method based on audio/video amalgamation policy
CN101819638A (en) * 2010-04-12 2010-09-01 中国科学院计算技术研究所 Establishment method of pornographic detection model and pornographic detection method
CN102222103A (en) * 2011-06-22 2011-10-19 央视国际网络有限公司 Method and device for processing matching relationship of video content
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
CN102799605A (en) * 2012-05-02 2012-11-28 天脉聚源(北京)传媒科技有限公司 Method and system for monitoring advertisement broadcast
CN202602832U (en) * 2012-05-10 2012-12-12 青岛海尔电子有限公司 System for identifying programs played on television
CN102831537A (en) * 2012-07-09 2012-12-19 北京十分科技有限公司 Method and device for obtaining network advertisement information
CN102890778A (en) * 2011-07-21 2013-01-23 北京新岸线网络技术有限公司 Content-based video detection method and device
CN103533459A (en) * 2013-10-09 2014-01-22 北京中科模识科技有限公司 Method and system for splitting news video entry
CN103581705A (en) * 2012-11-07 2014-02-12 深圳新感易搜网络科技有限公司 Method and system for recognizing video program
CN103617263A (en) * 2013-11-29 2014-03-05 安徽大学 Automatic TV advertisement movie clip detection method based on multi-mode features
US8781154B1 (en) * 2012-01-21 2014-07-15 Google Inc. Systems and methods facilitating random number generation for hashes in video and audio applications
CN104866616A (en) * 2015-06-07 2015-08-26 中科院成都信息技术股份有限公司 Method for searching monitor video target

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7027990B2 (en) * 2001-10-12 2006-04-11 Lester Sussman System and method for integrating the visual display of text menus for interactive voice response systems
US20070288452A1 (en) * 2006-06-12 2007-12-13 D&S Consultants, Inc. System and Method for Rapidly Searching a Database
CN100461179C (en) * 2006-10-11 2009-02-11 北京新岸线网络技术有限公司 Audio analysis system based on content
CN101640057A (en) * 2009-05-31 2010-02-03 北京中星微电子有限公司 Audio and video matching method and device therefor
CN102014295B (en) * 2010-11-19 2012-11-28 嘉兴学院 Network sensitive video detection method
EP2608062A1 (en) * 2011-12-23 2013-06-26 Thomson Licensing Method of automatic management of images in a collection of images and corresponding device
EP2648418A1 (en) * 2012-04-05 2013-10-09 Thomson Licensing Synchronization of multimedia streams
US8484017B1 (en) * 2012-09-10 2013-07-09 Google Inc. Identifying media content
US8805865B2 (en) * 2012-10-15 2014-08-12 Juked, Inc. Efficient matching of data
CN104036280A (en) * 2014-06-23 2014-09-10 国家广播电影电视总局广播科学研究院 Video fingerprinting method based on region of interest and cluster combination

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470897A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Sensitive film detection method based on audio/video amalgamation policy
CN101819638A (en) * 2010-04-12 2010-09-01 中国科学院计算技术研究所 Establishment method of pornographic detection model and pornographic detection method
CN102222103A (en) * 2011-06-22 2011-10-19 央视国际网络有限公司 Method and device for processing matching relationship of video content
CN102890778A (en) * 2011-07-21 2013-01-23 北京新岸线网络技术有限公司 Content-based video detection method and device
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
US8781154B1 (en) * 2012-01-21 2014-07-15 Google Inc. Systems and methods facilitating random number generation for hashes in video and audio applications
CN102799605A (en) * 2012-05-02 2012-11-28 天脉聚源(北京)传媒科技有限公司 Method and system for monitoring advertisement broadcast
CN202602832U (en) * 2012-05-10 2012-12-12 青岛海尔电子有限公司 System for identifying programs played on television
CN102831537A (en) * 2012-07-09 2012-12-19 北京十分科技有限公司 Method and device for obtaining network advertisement information
CN103581705A (en) * 2012-11-07 2014-02-12 深圳新感易搜网络科技有限公司 Method and system for recognizing video program
CN103533459A (en) * 2013-10-09 2014-01-22 北京中科模识科技有限公司 Method and system for splitting news video entry
CN103617263A (en) * 2013-11-29 2014-03-05 安徽大学 Automatic TV advertisement movie clip detection method based on multi-mode features
CN104866616A (en) * 2015-06-07 2015-08-26 中科院成都信息技术股份有限公司 Method for searching monitor video target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
快速准确的自动音乐/语音分段方法;万玉龙等;《清华大学学报》;20130630;第53卷(第6期);正文部分第1段、第3节、图4 *

Also Published As

Publication number Publication date
CN106601243A (en) 2017-04-26
WO2017067400A1 (en) 2017-04-27

Similar Documents

Publication Publication Date Title
CN106601243B (en) Video file identification method and device
US11132555B2 (en) Video detection method, server and storage medium
US20210166035A1 (en) Selecting and presenting representative frames for video previews
CN106973305B (en) Method and device for detecting bad content in video
US10497382B2 (en) Associating faces with voices for speaker diarization within videos
CN110147726B (en) Service quality inspection method and device, storage medium and electronic device
RU2738325C2 (en) Method and device for authenticating an individual
US9832523B2 (en) Commercial detection based on audio fingerprinting
US8140331B2 (en) Feature extraction for identification and classification of audio signals
US20140245463A1 (en) System and method for accessing multimedia content
CN104834849A (en) Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN107609149B (en) Video positioning method and device
CN108595422B (en) Method for filtering bad multimedia messages
CN107507626B (en) Mobile phone source identification method based on voice frequency spectrum fusion characteristics
WO2022142521A1 (en) Liveness detection method and apparatus, device, and storage medium
US20070220265A1 (en) Searching for a scaling factor for watermark detection
CN111863033A (en) Training method and device for audio quality recognition model, server and storage medium
WO2006009035A1 (en) Signal detecting method, signal detecting system, signal detecting program and recording medium on which the program is recorded
US7571093B1 (en) Method of identifying duplicate voice recording
CN109977265B (en) IPTV log user identification method based on user behavior characteristics
CN112418146B (en) Expression recognition method, apparatus, service robot, and readable storage medium
Martinez et al. SVM Candidates and Sparse Representation for Bird Identification.
Lopez-Otero et al. Introducing a Framework for the Evaluation of Music Detection Tools.
Petridis et al. A multi-class method for detecting audio events in news broadcasts
CN113362832A (en) Naming method and related device for audio and video characters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant