WO2014079322A1 - 音频流媒体的跟踪方法及系统、存储介质 - Google Patents

音频流媒体的跟踪方法及系统、存储介质 Download PDF

Info

Publication number
WO2014079322A1
WO2014079322A1 PCT/CN2013/086665 CN2013086665W WO2014079322A1 WO 2014079322 A1 WO2014079322 A1 WO 2014079322A1 CN 2013086665 W CN2013086665 W CN 2013086665W WO 2014079322 A1 WO2014079322 A1 WO 2014079322A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
streaming media
matching
fingerprint
media information
Prior art date
Application number
PCT/CN2013/086665
Other languages
English (en)
French (fr)
Inventor
易立夫
张云
李深远
陈剑锋
马斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2014079322A1 publication Critical patent/WO2014079322A1/zh
Priority to US14/720,591 priority Critical patent/US9612791B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/612Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation

Definitions

  • the present invention relates to the field of audio processing technologies, and in particular, to a method and system for tracking audio streaming media and a storage medium.
  • streaming media can be tracked using an audio fingerprint technology.
  • An audio fingerprint is a content-based compact digital signature that can represent an important acoustic feature of a piece of music.
  • the audio fingerprinting technology usually consists of two parts: a fingerprint extraction algorithm that calculates the important features of the auditory and a fingerprint matching algorithm that performs an effective search in the fingerprint database.
  • the audio feature is first calculated according to the fingerprint extraction algorithm, and then the fingerprint matching algorithm is compared with a large number of audio fingerprints stored in the fingerprint database to identify the corresponding audio.
  • An effective audio fingerprinting technology can correctly identify the original version of the distorted unknown audio that may be subject to various signal processing in the database.
  • an audio fingerprinting system is to identify a predetermined audio by receiving an audio signal and searching for the corresponding audio using a pre-built audio fingerprint database.
  • Audio fingerprinting systems have been used for broadcast monitors, CF recognition, and file filtering, depending on the application. In order to effectively use the audio fingerprint system in the field of application, even in various distortion situations, a high recognition rate and a fast search speed are required. In particular, in order to filter files in the P2P or UCC domain, it is necessary to quickly and accurately search for audio fingerprint data formed by hundreds of thousands of audio files each having its own copyright. For real-time processing in the field of broadcast monitoring and file filtering operating on a large-capacity audio fingerprint database, recognition speed is one of the most important factors.
  • using the audio fingerprint technology to track the streaming media includes: first, after the audio signal of the audio segment is framed, the key frame is determined based on the starting point detection algorithm, the audio fingerprint of the key frame is extracted, and the audio of the key frame is obtained.
  • the fingerprint and the streaming media information are correspondingly stored in the hash table, the user inputs the audio segment for audio fingerprint retrieval, obtains an audio fingerprint based on the audio signal of the audio segment, and then matches the corresponding streaming media from the hash table according to the audio fingerprint.
  • the information is obtained, and the streaming media information including the audio segment is obtained, and the identification of the streaming media is implemented, and the audio fingerprint matching needs to be continuously performed during the streaming media playback until the streaming media playback ends; the streaming media tracking mode has been identified.
  • the time-consuming and laborious calculation of audio fingerprint matching is still carried out, which is a huge consumption of computing resources and memory resources.
  • the response time of the search will be relatively long (for example, 1 second); at the same time, this is continued.
  • Match calculation if the results of the two matches before and after are slightly Do (due to repeated but streaming media streaming name and artist name are different small case), but also increase the complexity of the identified media stream (such as sort the results).
  • An object of the present invention is to provide a streaming media tracking method and system, which aims to solve the problem that the streaming media tracking method in the prior art continues to perform audio fingerprint matching after wasolating the streaming media, wastes computing resources and memory resources, and increases the flow.
  • the technical problem of media recognition complexity is to provide a streaming media tracking method and system, which aims to solve the problem that the streaming media tracking method in the prior art continues to perform audio fingerprint matching after wasolating the streaming media, wastes computing resources and memory resources, and increases the flow.
  • an embodiment of the present invention provides a method for tracking audio streaming media, including:
  • the audio fingerprint is representative a content-based digital signature of the audio streaming acoustic feature
  • the second streaming media information is used as the matching streaming media information of the currently played audio stream segment, and the first streaming media displayed by the interface is used. The information is replaced with the second streaming media information.
  • an embodiment of the present invention provides a method for tracking audio streaming media, including:
  • the second streaming media information is used as the matching streaming media information of the currently played audio stream segment.
  • an embodiment of the present invention provides a tracking system for audio streaming media, including
  • An audio processing module configured to segment the audio stream according to a time interval to form at least two audio stream segments
  • the information matching module is configured to: match, by the audio fingerprint, the first streaming media information corresponding to the currently played audio stream segment, and use the first media information as the matching streaming media information of the currently played audio stream segment;
  • a matching degree determining module configured to determine whether a matching degree of the next audio stream segment and the first streaming media information is greater than a preset threshold
  • the information matching module is further configured to: when the matching degree determining module determines that the matching degree is less than the preset threshold, match the second streaming media information corresponding to the next audio stream segment;
  • the result returning module is configured to determine, according to the matching degree determining module, that the matching degree is greater than the preset threshold, and use the second streaming media information as the matching streaming media information of the currently played audio stream segment.
  • embodiments of the present invention provide a storage medium having stored therein processor-executable instructions, wherein the processor-executable instructions are for causing a processor to:
  • the second streaming media information is used as the matching streaming media information of the currently played audio stream segment.
  • the method and system for tracking the audio streaming media determine whether the current streaming media is the previous matching streaming media. If yes, the result can be directly returned; otherwise, the streaming media matching is performed again. Since only the fingerprint of the current streaming media and the previous streaming media needs to be matched, the memory usage is small, and the calculation speed is increased, which not only greatly reduces the computational complexity of streaming media matching, but also achieves stable streaming media matching results, effectively avoiding The matching result displayed to the user is unstable, and the matching accuracy is improved; and the influence of the external audio on the matching accuracy of the streaming media during the playing of the streaming media can be effectively reduced, and the user experience is improved.
  • FIG. 1 is a schematic diagram of an operating environment of an audio streaming media tracking method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for tracking an audio streaming medium according to a first embodiment of the present invention
  • FIG. 3 is a flowchart of a method for tracking an audio streaming medium according to a second embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for matching streaming media information of an audio stream segment according to the present invention
  • FIG. 5 is a working principle diagram of a method for matching streaming media information of an audio stream segment according to the present invention.
  • FIG. 6 is a schematic diagram of a state of unmatched outgoing media according to the present invention:
  • FIG. 7 is a schematic diagram showing a state of matching streaming media information according to the present invention.
  • FIG. 8 is a schematic structural diagram of an audio streaming media tracking system according to a first embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of an audio streaming media tracking system according to a second embodiment of the present invention.
  • the principles of the present invention operate using many other general purpose or special purpose computing, communication environments, or configurations.
  • Examples of well-known computing systems, environments, and configurations suitable for use with the present invention may include, but are not limited to, hand-held phones, personal computers, servers, multi-processor systems, microcomputer-based systems, mainframe computers, and A distributed computing environment, including any of the above systems or devices.
  • module as used herein may be taken to mean a software object that is executed on the computing system.
  • the different components, modules, engines, and services described herein can be considered as implementation objects on the computing system.
  • the apparatus and method described herein are preferably implemented in software, and may of course be implemented in hardware, all of which are within the scope of the present invention.
  • FIG. 1 is a schematic diagram of an operating environment of an audio streaming media tracking method, including a streaming media acquiring device 11, a cutting device 12, a server 13, and a playback device 14.
  • the streaming media obtaining device 11 is configured to obtain audio streaming media.
  • the streaming media acquiring device 11 is, for example, a microphone, and the streaming media acquiring device 11 can obtain audio streaming media from a radio station.
  • the streaming media acquisition device 11 transmits the acquired audio streaming media to the cutting device 12.
  • the cutting device 12 segments the received streaming media to form at least two audio stream segments.
  • the server 13 receives and stores the HLS protocol-based streaming media sent by the cutting device 12.
  • the playback device 14 continuously downloads and plays an audio stream segment from the server 13.
  • the playback device 14 can be composed not only of a desktop computer but also a notebook computer, a workstation, a palmtop computer, and a UMPC (ultra mobile).
  • Personal computer ultra mobile PC
  • tablet PC personal digital assistant (Personal Digital Assistant , PDA), web pad, portable telephone, etc., which have a storage unit and are equipped with a microprocessor and can play streaming media.
  • PDA Personal Digital Assistant
  • FIG. 2 is a flowchart of a method for tracking an audio streaming media according to a first embodiment of the present invention.
  • the method for tracking audio streaming media according to the first embodiment of the present invention includes the following steps:
  • Step S100 segmenting the audio stream played by the radio station according to the time interval, forming at least two audio stream segments, and obtaining the first stream media information of the currently played audio stream segment by using the audio fingerprint;
  • the interval of the audio stream segment may be set according to an actual application.
  • the interval of the audio stream segment is 10 seconds; where the currently played audio stream segment is played.
  • the method for matching the first streaming media information includes: performing frame processing on the audio signal of the currently played audio stream segment to obtain a framed spectrogram; and detecting whether each binagram spectrogram is a key frame by using a starting point detection algorithm, The framing spectrogram of the frame is retained, and the framing spectrogram of the non-key frame is discarded; the audio fingerprint of the key frame is obtained, and the current media information corresponding to the audio fingerprint of the key frame is calculated and included Playing the first streaming media information of the audio stream segment and returning a matching result.
  • Step S110 determining whether the matching degree of the next audio stream segment and the first streaming media information is greater than a preset threshold, if the matching degree of the next audio stream segment and the first streaming media information is less than Determining the preset threshold, ending displaying the first streaming media information, and performing step S100 to match the second streaming media information of the next audio stream segment; if the next audio stream segment and the first streaming media The matching degree of the information is greater than the preset threshold, step S120 is performed;
  • step S110 after matching a piece of streaming media, in the subsequent streaming media tracking, it is only necessary to determine whether the current streaming media is the previously matched streaming media, and if so, the result can be directly returned;
  • Re-streaming media matching because only the fingerprint of the current streaming media and the previous streaming media needs to be matched, the memory usage is small, and the calculation speed is increased, which not only greatly reduces the computational complexity of streaming media matching, but also enables stable streaming media.
  • the matching result effectively prevents the matching result displayed to the user from being unstable. For example, when matching the same first streaming media, the frequent matching of the matching results improves the matching accuracy, and can effectively reduce the impact of external audio on the matching accuracy of the streaming media during playback of the streaming media, for example, when playing streaming media.
  • determining whether the matching degree of the next audio stream segment with the first streaming media information of the currently played audio stream segment is greater than a preset threshold is: calculating the next audio The Hamming distance of the audio fingerprint of the stream segment corresponding to the audio clip of the first stream media information corresponding to the currently played audio stream segment, to obtain the next audio stream segment corresponding to the currently played audio stream segment The degree of matching of the first streaming media information.
  • Step S120 Set the second streaming media information of the next audio stream segment to be the streaming media information that matches the currently played audio stream segment.
  • FIG. 3 is a flowchart of a method for tracking an audio streaming media according to a second embodiment of the present invention.
  • the method for tracking audio streaming media according to the second embodiment of the present invention includes the following steps:
  • Step S200 performing a slice process on the audio stream of the station at a certain time interval to form at least two audio stream segments;
  • the interval of the audio stream segment may be set according to an actual application.
  • the interval of the audio stream segment is 10 seconds.
  • Step S210 Perform streaming media information matching on the currently played audio stream segment by using an audio fingerprint to obtain first streaming media information corresponding to the currently played audio stream segment.
  • step S210 please refer to FIG. 4, which is a flowchart of the method for matching the streaming media information of the audio stream segment in the present invention.
  • the method for matching the streaming media information of the audio stream segment in the present invention includes the following steps. :
  • Step S211 randomly extract the spectrum image of the 11.6*w millisecond window length of the audio signal of the currently played audio stream segment by d/N milliseconds to obtain a framed spectrogram;
  • Step S212 detecting, by using a starting point detection algorithm, whether each of the sub-frame spectrograms is a key frame, retaining the sub-frame spectrogram of the key frame, and discarding the sub-frame spectrogram of the non-key frame;
  • step S212 the detecting, by the starting point detection algorithm, whether each of the sub-frame spectrograms corresponds to a key frame is: performing FFT on each of the framing frames obtained after the framing processing (Fast Fourier Transformation Fast Fourier Transform) + LPC Transform (linear predictive coding) Linear predictive coding) determines key frames in the respective framing.
  • FFT Fast Fourier Transformation Fast Fourier Transform
  • LPC Transform linear predictive coding
  • Step S213 performing short-time DCT on the key frame (Discrete Cosine) Transform, discrete cosine transform) transform, retaining the main DCT coefficients;
  • Step S214 using the binary representation of the reserved DCT coefficients
  • converting the DCT coefficients in binary representation into audio fingerprints by using a minimum hash algorithm
  • step S215 the random arrangement of the minimum hash algorithm is the same when the audio fingerprint is stored and queried.
  • Step S216 Using LSH (Locality Sensitive
  • the Hashing (location-sensitive hash algorithm) method divides the audio fingerprint into a predetermined number of audio sub-fingerprints and a hash sub-table, the predetermined number is, for example, b(bin), and the block stores the b-block audio sub-fingerprints into the hash sub-table, And finding a similar matching audio sub-fingerprint by calculating the number of occurrences of each audio sub-fingerprint;
  • step S216 "ABCDEFGHIJKLMNOPQRSTUVWXY" as shown in FIG. 4 represents an extracted audio fingerprint, and "ABCDE”, “EFGHI”, ..., “UVWXY” respectively represent audio sub-arrays obtained by dividing one of the audio fingerprints. fingerprint;
  • Step S217 discarding the audio sub-fingerprint whose number of occurrences of the audio sub-fingerprint is less than the matching threshold
  • step S217 as shown in FIG. 5, in the hash sub-list, the number of occurrences of the audio sub-finger in the audio file information 7, 12, 50, 92, 102, 302 are 1, 1, 1, 3, respectively. 2, 1, assuming that the current preset matching threshold is 2, the audio sub-finger corresponding to the audio file information 92, 102 is a similar matching audio sub-fingerprint.
  • Step S218 Comparing the audio fingerprint of the currently played audio stream segment with the retained audio sub-fingerprint, by using the retained Hamming distance of the audio sub-fingerprint and the audio fingerprint of the audio stream segment (Hamming Distance), the matching error is calculated to obtain an exact matching audio sub-fingerprint;
  • Step S219 Combine the streaming media information corresponding to the exact matching audio sub-fingerprint on the time axis by using a dynamic programming algorithm or a line detection algorithm to obtain first streaming media information including the currently played audio stream segment and return a matching result.
  • Step S220 Perform switching of interface display information according to the matching result, and display the first streaming media information and status in the interface;
  • step S220 the specific streaming media display effect is shown in FIG. 6 and FIG. 7.
  • FIG. 6 is a schematic diagram showing the state of the unmatched outgoing media according to the present invention
  • FIG. 7 is a view showing the matching streaming media information (such as the first streaming media information or the first The state diagram of the second-stream media information) allows the user to quickly identify whether there is a matching result by using the local area change of the play interface, thereby improving the use experience.
  • Step S230 Calculate a Hamming distance of an audio fingerprint of the next audio stream segment and an audio fingerprint of the first streaming media information of the currently played audio stream segment, to obtain a next audio stream segment and the first streaming media. a matching degree of the information, and determining whether the matching degree is greater than a preset threshold, if the matching degree is greater than the preset threshold, step S240 is performed; if the matching degree is less than the preset threshold, ending The first streaming media information is displayed, and step S210 is re-executed;
  • step S230 after matching a piece of streaming media, in the subsequent streaming media tracking, it is only necessary to determine whether the current streaming media is the previously matched streaming media, and if so, the result can be directly returned;
  • Re-streaming media matching because only the fingerprint of the current streaming media and the previous streaming media needs to be matched, the memory usage is small, and the calculation speed is increased, which not only greatly reduces the computational complexity of streaming media matching, but also enables stable streaming media.
  • Matching results effectively avoids unstable situations in which matching results are displayed to the user. For example, when matching the same primary streaming media, frequent changes of matching results before and after, improve matching accuracy; and can effectively reduce the process of playing streaming media.
  • the effect of external audio on the accuracy of streaming media matching such as occasional conversations or short advertisements when the streaming media is played.
  • Step S240 Set the second streaming media information of the next audio stream segment to be the streaming media information of the currently played audio stream segment, and perform matching of the matching streaming media information at the interface.
  • FIG. 8 is a schematic structural diagram of an audio streaming media tracking system according to a first embodiment of the present invention.
  • the audio streaming media tracking system of the first embodiment of the present invention includes:
  • the audio processing module 81 is configured to perform a slice process on the audio stream of the station to form at least two audio stream segments, wherein the interval of the audio stream segments may be set according to an actual application, in the embodiment of the present invention.
  • the interval of the audio stream segments is 10 seconds.
  • the information matching module 82 is configured to perform streaming media information matching on the currently played audio stream segment by using an audio fingerprint to obtain corresponding first streaming media information, where the first streaming media information matching manner of the currently played audio stream segment is performed.
  • the method comprises: performing frame processing on the audio signal of the currently played audio stream segment to obtain a framed spectrogram; and detecting, by using a starting point detection algorithm, whether each of the frame spectrograms is a key frame, and retaining the frame spectrogram of the key frame And discarding the framing spectrogram of the non-key frame; obtaining an audio fingerprint of the key frame, calculating, according to the streaming media information corresponding to the audio fingerprint of the key frame, the first streaming media information including the currently played audio stream segment and returning a match result.
  • the matching degree judging module 83 is configured to determine whether the matching degree of the next audio stream segment and the first streaming media information of the currently played audio stream segment is greater than a preset threshold, if the matching degree is less than the preset The threshold value ends, the first streaming media information is displayed, and the streaming media information is re-matched by the information matching module 82; if the matching degree is greater than the preset threshold, the result returning module 84 returns to the next one.
  • the matching streaming media of the audio stream segment is the streaming media information that matches the currently played audio stream segment; wherein, after matching to a streaming media, in the subsequent streaming media tracking, it is only necessary to determine whether the current streaming media is The previously matched streaming media, if it is, can directly return the result; otherwise, the streaming media matching, because only need to match the current streaming media and the previous streaming media fingerprint, memory usage is less, improve the calculation speed, Not only can the computational complexity of streaming media matching be greatly reduced, but also stable streaming media matching results can be achieved, effectively avoiding display to users.
  • the matching result is unstable. For example, when matching the same primary streaming media, frequent changes of the matching results before and after, improve the matching accuracy; and can effectively reduce the impact of external audio on the matching accuracy of the streaming audio during the streaming media.
  • an occasional conversation or a short advertisement of the host when the streaming media is played and a determination method for judging whether the matching degree of the matching audio stream segment of the next audio stream segment and the previous audio stream segment is greater than a preset threshold is: Calculating a Hamming distance of the fingerprint of the next audio stream segment and the audio fingerprint of the matching stream media information of the previous audio stream segment, and obtaining a matching degree of the matching stream media information of the next audio stream segment and the previous audio stream segment.
  • the result returning module 84 the second streaming media information (matching streaming media information) for setting the next audio stream segment is the matching streaming media information of the currently played audio stream segment.
  • FIG. 9 is a schematic structural diagram of an audio streaming media tracking system according to a second embodiment of the present invention.
  • the audio streaming media tracking system of the second embodiment of the present invention includes an audio processing module 91, an information matching module 92, an information display module 93, a matching degree judging module 94, and a result returning module 95, wherein
  • the audio processing module 91 is configured to perform a slicing process on the audio stream of the radio station to form at least two audio stream segments.
  • the interval time of the audio stream segments may be set according to an actual application, and is implemented in the present invention. In an example, the interval of the audio stream segment is 10 seconds;
  • the information matching module 92 is configured to perform streaming media information matching on the currently played audio stream segment by using an audio fingerprint to obtain corresponding first streaming media information. Specifically, the information matching module 92 further includes:
  • the spectrum map extracting unit 921 is configured to randomly extract the spectrum of the 11.6*w millisecond window length of the audio signal of the current audio stream segment by d/N milliseconds to obtain a framed spectrogram;
  • the key frame detecting unit 922 is configured to: detect, by using a starting point detection algorithm, whether each of the partial spectrum spectrograms is a key frame, retain a framing spectrogram of the key frame, and discard the framing spectrogram of the non-key frame; wherein the The start point detection algorithm detects whether each of the framing spectrograms corresponds to a key frame, and specifically: performing FFT on each framing obtained after the framing processing (Fast Fourier Transformation Fast Fourier Transform) + LPC Transform (linear predictive coding) Linear predictive coding) determines key frames in the respective framing.
  • FFT Fast Fourier Transform
  • LPC Transform linear predictive coding
  • Discrete cosine transform unit 923 short-time DCT for key frames (Discrete Cosine) Transform, discrete cosine transform) transform, retain the main DCT coefficients, and use binary representation of the reserved DCT coefficients;
  • the fingerprint conversion unit 924 is configured to convert the DCT coefficients in the binary representation into audio fingerprints by using a minimum hash algorithm; wherein the random arrangement of the minimum hash algorithm is the same when the audio fingerprint is stored and queried.
  • Fingerprint matching unit 925 for using LSH (Locality Sensitive Hashing, location-sensitive hash algorithm) divides the audio fingerprint into b (bin) block audio sub-fingerprint and l hash sub-table, and stores b-block audio sub-fingerprint into the hash sub-table, by calculating the appearance of each audio sub-fingerprint The number of times finds a similar matching audio sub-fingerprint, and discards the audio sub-fingerprint whose audio sub-fingerprint occurrence is less than the matching threshold; wherein, "ABCDEFGHIJKLMNOPQRSTUVWXY" as shown in FIG.
  • LSH Location Sensitive Hashing, location-sensitive hash algorithm
  • the audio sub-fingerprint is in the audio file information 7, 12, 50, 92, 102, The number of occurrences in 302 is 1, 1, 1, 3, 2, and 1, respectively. If the current preset matching threshold is 2, the audio sub-finger corresponding to the audio file information 92, 102 is a similarly matched audio sub-fingerprint.
  • the fingerprint determining unit 926 is configured to compare the fingerprint of the current audio stream segment with the retained audio sub-fingerprint, and the Hamming distance of the fingerprint of the audio stream segment by the retained audio sub-fingerprint (Hamming) Distance), the matching error is calculated to obtain an exact matching audio sub-fingerprint;
  • the information matching unit 927 is configured to merge the streaming media information corresponding to the exact matching audio sub-fingerprint on the time axis by using a dynamic programming algorithm or a line detection algorithm, obtain matching streaming media information including the current audio stream segment, and output a matching result.
  • the information display module 93 is configured to switch the interface display information according to the matching result, and display the streaming media information and the status in the interface.
  • FIG. 6 is not matched according to the present invention.
  • FIG. 7 is a schematic diagram showing the state of matching streaming media information.
  • the matching degree judging module 94 is configured to calculate a Hamming distance of the fingerprint of the next audio stream segment and the fingerprint of the previous matching streaming media, and obtain a matching degree between the next audio stream segment and the previous matching streaming media, and determine a match. Whether the degree is greater than the preset threshold, if the matching degree is greater than the preset threshold, the matching streaming media of the next audio stream segment is set by the result returning module to be the previous matching streaming media; if the matching degree is less than the preset threshold, the end is ended.
  • the previous streaming media is displayed, and the streaming matching is performed again through the information matching module; wherein, after matching to a streaming media, in the subsequent streaming tracking, it is only necessary to determine whether the current streaming media is the previously matched one.
  • the first-rate media if it is, can return the result directly; otherwise, the streaming media matching, because only need to match the current streaming media and the previous streaming media fingerprint, memory usage is less, improve the calculation speed, not only can greatly Reduce the computational complexity of streaming media matching, and achieve stable streaming media matching results, effectively avoiding matching knots presented to users
  • instability for example, when matching the same primary streaming media, the frequent changes of the matching results before and after, improve the retrieval accuracy; and can effectively reduce the impact of external audio on the matching accuracy of the streaming media during playback of the streaming media, for example, :
  • the host has occasional conversations or short advertisements when playing streaming media.
  • the result returning module 95 the second streaming media information for setting the next audio stream segment is the streaming media information that matches the currently played audio stream segment.
  • the method and system for tracking the audio streaming media of the present invention determines whether the current streaming media is the previous matching streaming media after identifying the streaming media. If yes, the result can be directly returned; otherwise, the streaming media matching is performed again.
  • the fingerprint of the current streaming media and the previous streaming media needs to be matched, the memory usage is small, and the calculation speed is increased, which not only greatly reduces the computational complexity of streaming media matching, but also achieves stable streaming media matching results, effectively avoiding display to users.
  • the matching result is unstable, and the matching accuracy is improved; and the influence of the external audio on the matching accuracy of the streaming media during the playing of the streaming media can be effectively reduced, and the user experience is improved.
  • the tracking system of the audio streaming media provided by the embodiment of the present invention is formed in a terminal, such as a computer, a tablet computer, a mobile phone with a touch function, etc., the tracking system of the audio streaming media and the above embodiment
  • the tracking method of the audio streaming media belongs to the same concept, and any method provided in the tracking method of the audio streaming media may be run on the tracking system of the audio streaming media, and the specific implementation process is described in the audio streaming media.
  • the tracking method embodiment is not described here.
  • the tracking method of the audio streaming media of the embodiment of the present invention may be performed by a computer program.
  • the computer program may be stored in a computer readable storage medium, such as in a memory of the terminal, and executed by at least one processor in the terminal, and may include, for example, during execution.
  • the storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM).
  • each functional module may be integrated into one processing chip, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated module if implemented in the form of a software functional module and sold or used as a standalone product, may also be stored in a computer readable storage medium, such as a read only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Discrete Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明提供一种音频流媒体的跟踪方法及系统,在识别当前播放的音频片段的匹配流媒体信息后,判断下一个音频片段的匹配流媒体信息是否为当前播放的音频片段的匹配流媒体信息,如果是,直接就可以返回结果;反之,再重新进行流媒体信息的匹配,内存占用少,提高计算速度,降低流媒体信息匹配的计算复杂度。

Description

音频流媒体的跟踪方法及系统、存储介质 技术领域
本发明涉及音频处理技术领域,特别涉及一种音频流媒体的跟踪方法及系统、存储介质。
背景技术
目前,用户在收听流媒体时(例如电台歌曲),由于播放一首完整的流媒体需要持续一段时间(例如3到4分钟),如何在播放过程中持续跟踪流媒体以实时显示准确的流媒体信息尤为重要。现有技术中,可以采用音频指纹(fingerprint)技术对流媒体进行跟踪。音频指纹是指可以代表一段音乐重要声学特征的基于内容的紧致的数字签名。音频指纹技术通常包括两个部分:即一个计算听觉重要特征的指纹提取算法和一个在指纹数据库中进行有效搜索的指纹比对算法。当要识别一段未知音频时,首先按照指纹提取算法计算其音频特征,然后和指纹数据库中存储的大量音频指纹按照指纹比对算法进行比对,识别出对应的音频。一个有效的音频指纹技术能够在数据库中正确识别出可能经受各种信号处理的、失真的未知音频的原始版本。
音频指纹系统的目标是通过接收音频信号并利用预先构建的音频指纹数据库搜索对应的音频来识别预定的音频。根据应用领域,音频指纹系统已经用于广播监视器、CF识别、和文件过滤。为了在所述应用领域中有效地使用音频指纹系统,甚至在各种失真情况下,也需要高识别率和快的搜索速度。具体地,为了在P2P或UCC领域中过滤文件,需要迅速且准确地搜索由其每一个具有自己的版权的几十万个音频文件形成的音频指纹数据。对于在基于大容量音频指纹数据库进行操作的广播监视和文件过滤领域中的实时处理,识别速度是最重要因素之一。
现有技术中采用音频指纹(fingerprint)技术对流媒体进行跟踪包括:首先对音频片段的音频信号进行分帧后,基于起始点检测算法确定关键帧,提取关键帧的音频指纹,将关键帧的音频指纹和流媒体信息对应存储到哈希表中,用户输入音频片段进行音频指纹检索,基于该音频片段的音频信号得到音频指纹,再根据该音频指纹从所述哈希表中匹配对应的流媒体信息,得到包含所述音频片段的流媒体信息,实现流媒体的识别,并需要在流媒体播放过程中持续不断的进行上述音频指纹匹配直到流媒体播放结束;上述流媒体跟踪方式在已经识别出流媒体之后,仍然持续进行音频指纹匹配这种费时费力的计算,对计算资源和内存资源都是一种巨大的消耗,通常检索的反应时间会比较长(比如1秒);同时,持续进行这种匹配计算,如果前后两次匹配的结果稍有差别(由于存在重复流媒体但流媒体名称和歌手名小有差异的情况),还会增加流媒体识别的复杂度(比如结果排序)。
故,有必要提出一种新的技术方案,以解决上述流媒体跟踪方式在已经识别出流媒体之后仍然持续进行音频指纹匹配浪费计算资源和内存资源且增加流媒体识别复杂度的技术问题。
技术问题
本发明的一个目的在于提供一种流媒体跟踪方法及系统,旨在解决现有技术中的流媒体跟踪方式在已经识别出流媒体之后仍然持续进行音频指纹匹配浪费计算资源和内存资源且增加流媒体识别复杂度的技术问题。
技术解决方案
为达到上述目的,本发明实施例提供了一种音频流媒体的跟踪方法,包括:
对音频流按照时间间隔进行切分,形成至少两个的音频流片段;
通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息,其中所述音频指纹为代表所述音频流媒体声学特征的基于内容的数字签名;
显示匹配到的所述第一流媒体信息;
计算下一个所述音频流片段的所述音频指纹与所述第一流媒体信息的所述音频指纹之间的哈明距离,得到下一个所述音频流片段与所述第一流媒体信息之间的匹配度;
判断所述匹配度是否大于预设阀值,若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
若所述匹配度大于所述预设阀值,将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息,并将所述界面显示的所述第一流媒体信息替换为所述第二流媒体信息。
为达到上述目的,本发明实施例提供了一种音频流媒体的跟踪方法,包括:
对音频流按照时间间隔进行切分,形成至少两个的音频流片段;
通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值;
若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
若所述匹配度大于所述预设阀值,则将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
为达到上述目的,本发明实施例提供了一种音频流媒体的跟踪系统,包括
音频处理模块:用于对音频流按照时间间隔进行切分,形成至少两个的音频流片段;
信息匹配模块:用于通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
匹配度判断模块:用于判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值;
所述信息匹配模块,还用于在所述匹配度判断模块判定所述匹配度小于所述预设阀值时,匹配下一个所述音频流片段对应的第二流媒体信息;以及
结果返回模块,用于在匹配度判断模块判定所述匹配度大于所述预设阀值,将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
为达到上述目的,本发明实施例提供了一种存储介质,其内存储有处理器可执行指令,其中所述处理器可执行指令用于让处理器完成以下操作:
对音频流按照时间间隔进行切分,形成至少两个的音频流片段;
通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值;
若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
若所述匹配度大于所述预设阀值,则将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
有益效果
本发明实施例的音频流媒体的跟踪方法及系统在识别出流媒体后,判断当前流媒体是否是前一首匹配流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况,提高匹配准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,提升用户体验。
附图说明
图1为本发明实施例的音频流媒体的跟踪方法的运行环境示意图;
图2为本发明第一实施例的音频流媒体的跟踪方法的流程图;
图3为本发明第二实施例的音频流媒体的跟踪方法的流程图;
图4为本发明音频流片段流媒体信息匹配方式的流程图;
图5为本发明音频流片段流媒体信息匹配方式的工作原理图;
图6为本发明未匹配出流媒体的状态示意图:
图7为本发明展示匹配流媒体信息的状态示意图;
图8为本发明第一实施例的音频流媒体的跟踪系统的结构示意图;
图9为本发明第二实施例的音频流媒体的跟踪系统的结构示意图。
请参照图式,其中相同的组件符号代表相同的组件,本发明的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本发明具体实施例,其不应被视为限制本发明未在此详述的其它具体实施例。
在以下的说明中,本发明的具体实施例将参考由一部或多部计算机所执行的步骤及符号来说明,除非另有述明。因此,这些步骤及操作将有数次提到由计算机执行,本文所指的计算机执行包括了由代表了以一结构化型式中的数据的电子信号的计算机处理单元的操作。此操作转换该数据或将其维持在该计算机的内存系统中的位置处,其可重新配置或另外以本领域技术人员所熟知的方式来改变该计算机的运作。该数据所维持的数据结构为该内存的实体位置,其具有由该数据格式所定义的特定特性。但是,本发明原理以上述文字来说明,其并不代表为一种限制,本领域技术人员将可了解到以下所述的多种步骤及操作亦可实施在硬件当中。
本发明的原理使用许多其它泛用性或特定目的运算、通信环境或组态来进行操作。所熟知的适合用于本发明的运算系统、环境与组态的范例可包括(但不限于)手持电话、个人计算机、服务器、多处理器系统、微电脑为主的系统、主架构型计算机、及分布式运算环境,其中包括了任何的上述系统或装置。
本文所使用的术语「模块」可看做为在该运算系统上执行的软件对象。本文所述的不同组件、模块、引擎及服务可看做为在该运算系统上的实施对象。而本文所述的装置及方法优选的以软件的方式进行实施,当然也可在硬件上进行实施,均在本发明保护范围之内。
请参阅图1,图1为本发明实施例中音频流媒体的跟踪方法的运行环境示意图,包括流媒体获取设备11、切割装置12、服务器13以及播放装置14。其中所述流媒体获取设备11用于获取音频流媒体,所述流媒体获取设备11譬如为麦克风等,所述流媒体获取设备11可从电台获取音频流媒体。所述流媒体获取设备11将获取的音频流媒体传送至切割装置12中。所述切割装置12对所接收的流媒体进行切分,形成至少两个的音频流片段。所述服务器13接收并存储所述切割装置12发送的基于HLS协议的流媒体。所述播放装置14从所述服务器13不断地下载并播放音频流片段。
其中所述播放装置14不仅可以由桌上型计算机构成,还可以由笔记型计算机、工作站、掌上型计算机、UMPC(ultra mobile personal computer:超移动个人计算机)、平板PC、个人数字助理(Personal Digital Assistant ,PDA)、连网板(web pad)、可携式电话等具备储存单元并安装有微处理器并可播放流媒体的终端机构成。
请参考图2,为本发明第一实施例的音频流媒体的跟踪方法的流程图。本发明第一实施例的音频流媒体的跟踪方法包括下列步骤:
步骤S100:对电台播放的音频流按照时间间隔进行切分,形成至少两个的音频流片段,并通过音频指纹得到当前播放的音频流片段的第一流媒体信息;
在步骤S100中,所述音频流片段的间隔时间可根据实际应用进行设置,在本发明实施例中,所述音频流片段的间隔时间为10秒;其中当前播放的所述音频流片段的所述第一流媒体信息匹配方式包括:对当前播放的所述音频流片段的音频信号进行分帧处理,得到分帧频谱图;通过起始点检测算法检测各分帧频谱图是否为关键帧,将关键帧的所述分帧频谱图保留,抛弃非关键帧的所述分帧频谱图;得到所述关键帧的音频指纹,根据所述关键帧的所述音频指纹对应的流媒体信息计算得到包含当前播放的所述音频流片段的第一流媒体信息并返回匹配结果。
步骤S110:判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值,如果下一个所述音频流片段与所述第一流媒体信息的所述匹配度小于所述预设阀值,则结束展示所述第一流媒体信息,并重新执行步骤S100匹配下一个所述音频流片段的第二流媒体信息;如果下一个所述音频流片段与所述第一流媒体信息的所述匹配度大于所述预设阀值,执行步骤S120;
在步骤S110中,当匹配到一首流媒体后,在随后的流媒体跟踪时,只需要判断当前流媒体是否是之前匹配的那首流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况。例如:对同一首流媒体进行匹配时,前后匹配结果的频繁变化,提高匹配准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,例如:播放流媒体时主持人偶尔的谈话或简短的广告等;判断下一个音频流片段与当前播放的所述音频流片段的第一流媒体信息的匹配度是否大于预设阀值的判断方式为:计算下一个所述音频流片段的音频指纹与当前播放的所述音频流片段对应的所述第一流媒体信息的音频指纹的哈明距离,得到下一个所述音频流片段与当前播放的所述音频流片段对应的所述第一流媒体信息的匹配度。
步骤S120:设置下一个音频流片段的第二流媒体信息为当前播放的所述音频流片段相匹配的流媒体信息。
请参考图3,为本发明第二实施例的音频流媒体的跟踪方法的流程图。本发明第二实施例的音频流媒体的跟踪方法包括下列步骤:
步骤S200:对电台的音频流按一定时间间隔进行切片处理,形成至少两个的音频流片段;
在步骤S200中,所述音频流片段的间隔时间可根据实际应用进行设置,在本发明实施例中,所述音频流片段的间隔时间为10秒。
步骤S210:通过音频指纹对当前播放的所述音频流片段进行流媒体信息匹配,得到对应当前播放的所述音频流片段的第一流媒体信息;
在步骤S210中,请参阅图4,为本发明中所述音频流片段的所述流媒体信息匹配方式的流程图;本发明中所述音频流片段的所述流媒体信息匹配方式包括以下步骤:
步骤S211:对当前播放的所述音频流片段的音频信号,平均以d/N毫秒时间随机提取其11.6*w毫秒窗长的频谱图,得到分帧频谱图;
步骤S212:通过起始点检测算法检测各分帧频谱图是否为关键帧,将关键帧的所述分帧频谱图保留,抛弃非关键帧的所述分帧频谱图;
在步骤S212中,所述通过起始点检测算法检测各分帧频谱图是否对应关键帧具体为:对所述分帧处理后得到的各分帧进行FFT(Fast Fourier Transformation快速傅氏变换)+LPC变换(linear predictive coding 线性预测编码)确定所述各分帧中的关键帧。
步骤S213:对所述关键帧进行短时的DCT(Discrete Cosine Transform,离散余弦变换)变换,保留主要的DCT系数;
步骤S214:采用二进制表示保留的所述DCT系数;
步骤S215|:采用最小哈希算法将采用二进制表示的所述DCT系数转换为音频指纹;
在步骤S215中,最小哈希算法的随机排列方式在音频指纹存储和查询时相同。
步骤S216:用LSH(Locality Sensitive Hashing,位置敏感哈希算法)方法将音频指纹分为预定数量音频子指纹和1个哈希子表,该预定数量譬如为b(bin),块将b块音频子指纹存储到哈希子表中,并通过计算各个音频子指纹的出现次数找到相近匹配音频子指纹;
在步骤S216中,如图4所示的“ABCDEFGHIJKLMNOPQRSTUVWXY”表示提取得到的一个音频指纹,“ABCDE”、“EFGHI”、……、“UVWXY”分别表示由分割一个所述音频指纹后得到的音频子指纹;
步骤S217:放弃音频子指纹出现次数小于匹配阀值的所述音频子指纹;
在步骤S217中,如图5所示,在哈希子表中,所述音频子指纹在音频文件信息7、12、50、92、102、302中的出现次数分别为1、1、1、3、2、1,假设当前预设的匹配阈值为2,则音频文件信息92、102所对应的音频子指纹为相近匹配音频子指纹。
步骤S218:将当前播放的所述音频流片段的音频指纹与保留的所述音频子指纹进行比较,通过保留的所述音频子指纹与音频流片段的音频指纹的哈明距离(Hamming distance),计算出匹配误差,得到精确匹配音频子指纹;
步骤S219:用动态规划算法或直线检测算法在时间轴上合并与所述精确匹配音频子指纹对应的流媒体信息,得到包含当前播放的所述音频流片段的第一流媒体信息并返回匹配结果。
步骤S220:根据匹配结果进行界面展示信息的切换,在所述界面中展示所述第一流媒体信息及状态;
在步骤S220中,具体流媒体展示效果请参阅图6和图7,图6为本发明未匹配出流媒体的状态示意图;图7为本发明展示匹配流媒体信息(譬如第一流媒体信息或者第二流媒体信息)的状态示意图,通过利用播放界面局部区域的改变,让用户快速辨别当前有无匹配结果,提升使用体验。
步骤S230:计算下一个所述音频流片段的音频指纹与当前播放的所述音频流片段的第一流媒体信息的音频指纹的哈明距离,得到下一个所述音频流片段与所述第一流媒体信息的匹配度,并判断所述匹配度是否大于预设阀值,如果所述匹配度大于所述预设阀值,执行步骤S240;如果所述匹配度小于所述预设阀值,则结束所述第一流媒体信息的展示,并重新执行步骤S210;
在步骤S230中,当匹配到一首流媒体后,在随后的流媒体跟踪时,只需要判断当前流媒体是否是之前匹配的那首流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况,例如:对同一首流媒体进行匹配时,前后匹配结果的频繁变化,提高匹配准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,例如:播放流媒体时主持人偶尔的谈话或简短的广告等。
步骤S240:设置下一个所述音频流片段的第二流媒体信息为当前播放的所述音频流片段的流媒体信息,并在界面进行匹配流媒体信息的切换。
请参考图8,为本发明第一实施例的音频流媒体的跟踪系统的结构示意图。本发明第一实施例的音频流媒体的跟踪系统包括:
音频处理模块81:用于对电台的音频流按照时间间隔进行切片处理,形成至少两个的音频流片段;其中,所述音频流片段的间隔时间可根据实际应用进行设置,在本发明实施例中,所述音频流片段的间隔时间为10秒。
信息匹配模块82:用于通过音频指纹对当前播放的所述音频流片段进行流媒体信息匹配,得到对应的第一流媒体信息;其中,当前播放的所述音频流片段的第一流媒体信息匹配方式包括:对当前播放的所述音频流片段的音频信号进行分帧处理,得到分帧频谱图;通过起始点检测算法检测各分帧频谱图是否为关键帧,将关键帧的分帧频谱图保留,抛弃非关键帧的分帧频谱图;得到所述关键帧的音频指纹,根据关键帧的音频指纹对应的流媒体信息计算得到包含当前播放的所述音频流片段的第一流媒体信息并返回匹配结果。
匹配度判断模块83:用于判断下一个所述音频流片段与当前播放的所述音频流片段的第一流媒体信息的匹配度是否大于预设阀值,如果所述匹配度小于所述预设阀值,则结束展示所述第一流媒体信息,并通过所述信息匹配模块82重新匹配流媒体信息;如果所述匹配度大于所述预设阀值,通过结果返回模块84返回下一个所述音频流片段的匹配流媒体为当前播放的所述音频流片段相匹配的流媒体信息;其中,当匹配到一首流媒体后,在随后的流媒体跟踪时,只需要判断当前流媒体是否是之前匹配的那首流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况,例如:对同一首流媒体进行匹配时,前后匹配结果的频繁变化,提高匹配准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,例如:播放流媒体时主持人偶尔的谈话或简短的广告等;判断下一个音频流片段与前一首音频流片段的匹配流媒体信息的匹配度是否大于预设阀值的判断方式为:计算下一个音频流片段的指纹与前一首音频流片段的匹配流媒体信息的音频指纹的哈明距离,得到下一个音频流片段与前一首音频流片段的匹配流媒体信息的匹配度。
结果返回模块84:用于设置下一个所述音频流片段的第二流媒体信息(匹配流媒体信息)为当前播放的所述音频流片段的匹配流媒体信息。
请参考图9,为本发明第二实施例的音频流媒体的跟踪系统的结构示意图。本发明第二实施例的音频流媒体的跟踪系统包括音频处理模块91、信息匹配模块92、信息展示模块93、匹配度判断模块94和结果返回模块95,其中,
音频处理模块91:用于对电台的音频流按一定时间间隔进行切片处理,形成至少两个的音频流片段;其中,所述音频流片段的间隔时间可根据实际应用进行设置,在本发明实施例中,所述音频流片段的间隔时间为10秒;
信息匹配模块92用于通过音频指纹对当前播放的所述音频流片段进行流媒体信息匹配,得到对应的第一流媒体信息;具体地,所述信息匹配模块92还包括:
频谱图提取单元921:用于对当前音频流片段的音频信号,平均以d/N毫秒时间随机提取其11.6*w毫秒窗长的频谱图,得到分帧频谱图;
关键帧检测单元922:用于通过起始点检测算法检测各分帧频谱图是否为关键帧,将关键帧的分帧频谱图保留,抛弃非关键帧的分帧频谱图;其中,所述通过起始点检测算法检测各分帧频谱图是否对应关键帧具体为:对所述分帧处理后得到的各分帧进行FFT(Fast Fourier Transformation快速傅氏变换)+LPC变换(linear predictive coding 线性预测编码)确定所述各分帧中的关键帧。
离散余弦变换单元923:用于对关键帧进行短时的DCT(Discrete Cosine Transform,离散余弦变换)变换,保留主要的DCT系数,并采用二进制表示保留的DCT系数;
指纹转换单元924:用于采用最小哈希算法将采用二进制表示的DCT系数转换为音频指纹;其中,最小哈希算法的随机排列方式在音频指纹存储和查询时相同。
指纹匹配单元925:用于用LSH(Locality Sensitive Hashing,位置敏感哈希算法)方法将音频指纹分为b(bin)块音频子指纹和l个哈希子表,将b块音频子指纹存储到哈希子表中,通过计算各个音频子指纹的出现次数找到相近匹配音频子指纹,并放弃音频子指纹出现次数小于匹配阀值的音频子指纹;其中,如图4所示的“ABCDEFGHIJKLMNOPQRSTUVWXY”表示提取得到的一个音频指纹,“ABCDE”、“EFGHI”、……、“UVWXY”分别表示由分割一个音频指纹后得到的音频子指纹;如图4所示,在哈希子表中,音频子指纹在音频文件信息7、12、50、92、102、302中的出现次数分别为1、1、1、3、2、1,假设当前预设的匹配阈值为2,则音频文件信息92、102所对应的音频子指纹为相近匹配音频子指纹。
指纹确定单元926:用于将当前音频流片段的指纹与保留的音频子指纹进行比较,通过保留的音频子指纹与音频流片段的指纹的哈明距离(Hamming distance),计算出匹配误差,得到精确匹配音频子指纹;
信息匹配单元927:用于用动态规划算法或直线检测算法在时间轴上合并与所述精确匹配音频子指纹对应的流媒体信息,得到包含当前音频流片段的匹配流媒体信息并输出匹配结果。
信息展示模块93:用于根据匹配结果进行界面展示信息的切换,在所述界面中展示流媒体信息及状态;具体流媒体展示效果请参阅图6和图7,图6为本发明未匹配出流媒体的状态示意图;图7为本发明展示匹配流媒体信息的状态示意图,通过利用播放界面局部区域的改变,让用户快速辨别当前有无匹配结果,提升使用体验。
匹配度判断模块94:用于计算下一个音频流片段的指纹与前一首匹配流媒体的指纹的哈明距离,得到下一个音频流片段与前一首匹配流媒体的匹配度,并判断匹配度是否大于预设阀值,如果匹配度大于预设阀值,通过结果返回模块设置下一个音频流片段的匹配流媒体为前一首匹配流媒体;如果匹配度小于预设阀值,则结束前一首流媒体的展示,并通过信息匹配模块重新进行流媒体匹配;其中,当匹配到一首流媒体后,在随后的流媒体跟踪时,只需要判断当前流媒体是否是之前匹配的那首流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况,例如:对同一首流媒体进行匹配时,前后匹配结果的频繁变化,提高检索准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,例如:播放流媒体时主持人偶尔的谈话或简短的广告等。
结果返回模块95:用于设置下一个所述音频流片段的第二流媒体信息为当前播放的所述音频流片段相匹配的流媒体信息。
本发明音频流媒体的跟踪方法及系统在识别出流媒体后,判断当前流媒体是否是前一首匹配流媒体,如果是,直接就可以返回结果;反之,再重新进行流媒体匹配,由于只需要匹配当前流媒体与前一首流媒体的指纹,内存占用少,提高计算速度,不仅仅能大大降低流媒体匹配的计算复杂度,而且能实现稳定的流媒体匹配结果,有效避免展示给用户的匹配结果出现不稳定的情况,提高匹配准确度;并能有效减少在播放流媒体过程中外部音频对流媒体匹配准确性的影响,提升用户体验。
本发明实施例提供的音频流媒体的跟踪系统形成于一终端,所述终端譬如为计算机、平板电脑、具有触摸功能的手机等等,所述音频流媒体的跟踪系统与上文实施例中的音频流媒体的跟踪方法属于同一构思,在所述音频流媒体的跟踪系统上可以运行所述音频流媒体的跟踪方法实施例中提供的任一方法,其具体实现过程详见所述音频流媒体的跟踪方法实施例,此处不再赘述。
需要说明的是,对本发明实施例的音频流媒体的跟踪方法而言,本领域普通技术人员可以理解实现本发明实施例的音频流媒体的跟踪方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在终端的存储器中,并被该终端内的至少一个处理器执行,在执行过程中可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。
对本发明实施例的音频流媒体的跟踪系统而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
综上所述,虽然本发明已以优选实施例揭露如上,但上述优选实施例并非用以限制本发明,本领域的普通技术人员,在不脱离本发明的精神和范围内,均可作各种更动与润饰,因此本发明的保护范围以权利要求界定的范围为准。
本发明的实施方式
工业实用性
序列表自由内容

Claims (21)

  1. 一种音频流媒体的跟踪方法,包括:
    对音频流按照时间间隔进行切分,以形成至少两个的音频流片段;
    通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息,其中所述音频指纹为代表所述音频流媒体声学特征的基于内容的数字签名;
    显示匹配到的所述第一流媒体信息;
    计算下一个所述音频流片段的所述音频指纹与所述第一流媒体信息的所述音频指纹之间的哈明距离,以得到下一个所述音频流片段与所述第一流媒体信息之间的匹配度;
    判断所述匹配度是否大于预设阀值,若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
    若所述匹配度大于所述预设阀值,将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息,并将所述界面显示的所述第一流媒体信息替换为所述第二流媒体信息。
  2. 根据权利要求1所述的音频流媒体的跟踪方法,其中通过所述音频指纹匹配所述第一流媒体信息的步骤包括:
    对当前播放的所述音频流片段的音频信号进行分帧处理,以获取分帧频谱图;
    通过起始点检测算法检测所述分帧频谱图中的关键帧,且保留所述关键帧对应的所述分帧频谱图;以及
    获取所述关键帧的所述音频指纹,并根据所述关键帧的所述音频指纹对应的流媒体信息生成所述第一流媒体信息,其中所述第一流媒体信息包含当前播放的所述音频流片段。
  3. 根据权利要求2所述的音频流媒体的跟踪方法,还包括:
    在保留所述关键帧的所述分帧频谱图后,对所述关键帧进行离散余弦变换,并保留离散余弦变换系数;以及
    对保留的所述离散余弦变换系数采用二进制表示,并采用最小哈希算法将采用所述二进制表示的所述离散余弦变换系数转换为所述音频指纹。
  4. 根据权利要求2所述的音频流媒体的跟踪方法,其中所述匹配所述第一流媒体信息的步骤包括:
    将所述音频指纹划分为一哈希子表以及预定数量的音频子指纹,并将所述预定数量的所述音频子指纹存储到所述哈希子表中;
    计算所述音频子指纹的出现次数,并删除所述出现次数小于匹配阀值的所述音频子指纹;以及
    将当前播放的所述音频流片段的所述音频指纹与保留的所述音频子指纹进行比较,通过保留的所述音频子指纹与所述音频流片段的所述音频指纹的哈明距离生成匹配误差,以获取一精确匹配音频子指纹。
  5. 根据权利要求4所述的音频流媒体的跟踪方法,还包括:在获取所述精确匹配音频子指纹后,用动态规划算法或直线检测算法在时间轴上合并与所述精确匹配音频子指纹对应的所述流媒体信息,以生成包含当前播放的所述音频流片段的所述第一流媒体信息。
  6. 一种音频流媒体的跟踪方法,包括:
    对音频流按照时间间隔进行切分,以形成至少两个的音频流片段;
    通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
    判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值;
    若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
    若所述匹配度大于所述预设阀值,则将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
  7. 根据权利要求6所述的音频流媒体的跟踪方法,其中所述音频指纹为代表音频流媒体重要声学特征的基于内容的数字签名。
  8. 根据权利要求7所述的音频流媒体的跟踪方法,其中所述通过所述音频指纹匹配所述第一流媒体信息的步骤包括:
    对当前播放的所述音频流片段的音频信号进行分帧处理,获取分帧频谱图;
    通过起始点检测算法检测所述分帧频谱图中的关键帧,且保留所述关键帧对应的所述分帧频谱图;以及
    获取所述关键帧的所述音频指纹,并根据所述关键帧的所述音频指纹对应的流媒体信息生成所述第一流媒体信息,其中所述第一流媒体信息包含当前播放的所述音频流片段。
  9. 根据权利要求8所述的音频流媒体的跟踪方法,还包括:
    在保留所述关键帧的所述分帧频谱图后,对所述关键帧进行离散余弦变换,并保留离散余弦变换系数;以及
    对保留的所述离散余弦变换系数采用二进制表示,并采用最小哈希算法将采用所述二进制表示的所述离散余弦变换系数转换为所述音频指纹。
  10. 根据权利要求8所述的音频流媒体的跟踪方法,其中所述匹配所述第一流媒体信息的步骤包括:
    将所述音频指纹划分为一哈希子表以及预定数量的音频子指纹,并将所述预定数量的所述音频子指纹存储到所述哈希子表中;
    计算所述音频子指纹的出现次数,并删除所述出现次数小于匹配阀值的所述音频子指纹;以及
    将当前播放的所述音频流片段的所述音频指纹与保留的所述音频子指纹进行比较,并通过保留的所述音频子指纹与所述音频流片段的所述音频指纹的哈明距离生成匹配误差,以获取一精确匹配音频子指纹。
  11. 根据权利要求10所述的音频流媒体的跟踪方法,还包括:在获取所述精确匹配音频子指纹后,用动态规划算法或直线检测算法在时间轴上合并与所述精确匹配音频子指纹对应的所述流媒体信息,以生成包含当前播放的所述音频流片段的所述第一流媒体信息。
  12. 根据权利要求6所述的音频流媒体的跟踪方法,还包括:在获取所述第一流媒体信息后,显示所述第一流媒体信息。
  13. 根据权利要求7所述的音频流媒体的跟踪方法,还包括:在比较所述匹配度和所述预设阀值之前,首先算下一个所述音频流片段的所述音频指纹与所述第一流媒体信息的所述音频指纹之间的哈明距离,以生成下一个所述音频流片段与所述第一流媒体信息之间的所述匹配度。
  14. 一种音频流媒体的跟踪系统,包括:
    音频处理模块:用于对音频流按照时间间隔进行切分,以形成至少两个的音频流片段;
    信息匹配模块:用于通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
    匹配度判断模块:用于判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值,其中所述信息匹配模块还用于在所述匹配度判断模块判定所述匹配度小于所述预设阀值时,匹配下一个所述音频流片段对应的第二流媒体信息;以及
    结果返回模块,用于在所述匹配度判断模块判定所述匹配度大于所述预设阀值时,将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
  15. 根据权利要求14所述的音频流媒体的跟踪系统,其中所述信息匹配模块包括:
    频谱图提取单元:用于对当前播放的所述音频流片段的音频信号进行分帧处理,以获取分帧频谱图;以及
    关键帧检测单元:通过起始点检测算法检测所述分帧频谱图中的关键帧,且保留所述关键帧对应的所述分帧频谱图。
  16. 根据权利要求15所述的音频流媒体的跟踪系统,其中所述信息匹配模块还包括:
    离散余弦变换单元:用于对所述关键帧进行离散余弦变换,保留离散余弦变换系数;以及
    指纹转换单元:用于对保留的所述离散余弦变换系数采用二进制表示,并采用最小哈希算法将采用所述二进制表示的所述离散余弦变换系数转换为所述音频指纹。
  17. 根据权利要求16所述的音频流媒体的跟踪系统,其中所述信息匹配模块还包括:
    指纹匹配单元:用于将所述音频指纹划分为一哈希子表以及预定数量的音频子指纹,将所述预定数量的所述音频子指纹存储到所述哈希子表中,并计算所述音频子指纹的出现次数,及删除所述出现次数小于匹配阀值的所述音频子指纹;以及
    指纹确定单元:用于将当前播放的所述音频流片段的所述音频指纹与保留的所述音频子指纹进行比较,通过保留的所述音频子指纹与所述音频流片段的所述音频指纹的哈明距离生成匹配误差,获取一精确匹配音频子指纹。
  18. 根据权利要求17所述的音频流媒体的跟踪系统,其中所述信息匹配模块还包括:
    信息匹配单元:用于通过动态规划算法或直线检测算法在时间轴上合并与所述精确匹配音频子指纹对应的所述流媒体信息,生成包含当前播放的所述音频流片段的所述第一流媒体信息。
  19. 根据权利要求14所述的音频流媒体的跟踪系统,还包括:
    信息展示模块:用于在获取所述第一流媒体信息后,显示所述第一流媒体信息,以及在所述结果返回模块将所述第二流媒体信息作为当前播放的所述音频流片段的所述流媒体信息后,显示所述第二流媒体信息。
  20. 根据权利要求14所述的音频流媒体的跟踪系统,其中所述匹配度判断模块,还用于计算下一个所述音频流片段的所述音频指纹与所述第一流媒体信息的所述音频指纹之间的哈明距离,以生成下一个所述音频流片段与所述第一流媒体信息之间的所述匹配度。
  21. 一种存储介质,其内存储有处理器可执行指令,其中所述处理器可执行指令用于让处理器完成以下操作:
    对音频流按照时间间隔进行切分,以形成至少两个的音频流片段;
    通过音频指纹匹配与当前播放的所述音频流片段对应的第一流媒体信息,并将所述第一媒体信息作为当前播放的所述音频流片段的匹配流媒体信息;
    判断下一个所述音频流片段与所述第一流媒体信息的匹配度是否大于预设阀值;
    若所述匹配度小于所述预设阀值,则进行匹配下一个所述音频流片段对应的第二流媒体信息的步骤;以及
    若所述匹配度大于所述预设阀值,则将所述第二流媒体信息作为当前播放的所述音频流片段的所述匹配流媒体信息。
PCT/CN2013/086665 2012-11-22 2013-11-07 音频流媒体的跟踪方法及系统、存储介质 WO2014079322A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/720,591 US9612791B2 (en) 2012-11-22 2015-05-22 Method, system and storage medium for monitoring audio streaming media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210477360.1A CN103021440B (zh) 2012-11-22 2012-11-22 一种音频流媒体的跟踪方法及系统
CN201210477360.1 2012-11-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/720,591 Continuation US9612791B2 (en) 2012-11-22 2015-05-22 Method, system and storage medium for monitoring audio streaming media

Publications (1)

Publication Number Publication Date
WO2014079322A1 true WO2014079322A1 (zh) 2014-05-30

Family

ID=47969959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/086665 WO2014079322A1 (zh) 2012-11-22 2013-11-07 音频流媒体的跟踪方法及系统、存储介质

Country Status (3)

Country Link
US (1) US9612791B2 (zh)
CN (1) CN103021440B (zh)
WO (1) WO2014079322A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105307054A (zh) * 2015-10-28 2016-02-03 成都三零凯天通信实业有限公司 一种地面数字电视防插播方法

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103021440B (zh) * 2012-11-22 2015-04-22 腾讯科技(深圳)有限公司 一种音频流媒体的跟踪方法及系统
US9635417B2 (en) * 2013-04-05 2017-04-25 Dolby Laboratories Licensing Corporation Acquisition, recovery, and matching of unique information from file-based media for automated file detection
CN104125509B (zh) 2013-04-28 2015-09-30 腾讯科技(深圳)有限公司 节目识别方法、装置及服务器
CN104092654B (zh) * 2014-01-22 2016-03-02 腾讯科技(深圳)有限公司 媒体播放方法、客户端及系统
US9438940B2 (en) * 2014-04-07 2016-09-06 The Nielsen Company (Us), Llc Methods and apparatus to identify media using hash keys
CN104900239B (zh) * 2015-05-14 2018-08-21 电子科技大学 一种基于沃尔什-哈达码变换的音频实时比对方法
CN104915403B (zh) * 2015-06-01 2018-07-27 腾讯科技(北京)有限公司 一种信息处理方法及服务器
US9918141B2 (en) * 2015-08-05 2018-03-13 Surewaves Mediatech Private Limited System and method for monitoring and detecting television ads in real-time using content databases (ADEX reporter)
CN105550257B (zh) * 2015-12-10 2019-05-03 杭州当虹科技股份有限公司 一种音视频指纹识别方法及一种基于音视频指纹流媒体的防篡改系统
CN105847878A (zh) * 2016-03-23 2016-08-10 乐视网信息技术(北京)股份有限公司 数据推荐方法及装置
CN105975568B (zh) * 2016-04-29 2020-04-03 腾讯科技(深圳)有限公司 一种音频处理方法及装置
US10373179B2 (en) * 2016-08-08 2019-08-06 International Business Machines Corporation Determining streaming content user consumption
CN108234433A (zh) * 2016-12-22 2018-06-29 华为技术有限公司 用于处理视频业务的方法和装置
CN108198573B (zh) * 2017-12-29 2021-04-30 北京奇艺世纪科技有限公司 音频识别方法及装置、存储介质及电子设备
CN108648733B (zh) * 2018-03-15 2020-07-03 北京雷石天地电子技术有限公司 一种迪曲生成方法及系统
US11527265B2 (en) 2018-11-02 2022-12-13 BriefCam Ltd. Method and system for automatic object-aware video or audio redaction
CN110223709B (zh) * 2019-05-31 2021-08-27 维沃移动通信有限公司 一种录音频谱显示方法及终端设备
CN110880315A (zh) * 2019-10-17 2020-03-13 深圳市声希科技有限公司 一种基于音素后验概率的个性化语音和视频生成系统
CN111081276B (zh) * 2019-12-04 2023-06-27 广州酷狗计算机科技有限公司 音频段的匹配方法、装置、设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159834A (zh) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 一种重复性视频音频节目片段的检测方法和系统
CN101615417A (zh) * 2009-07-24 2009-12-30 北京海尔集成电路设计有限公司 一种精确到字的中文同步显示歌词方法
CN101651694A (zh) * 2009-09-18 2010-02-17 北京亮点时间科技有限公司 提供音频相关信息的方法、系统、客户端及服务器
CN101807208A (zh) * 2010-03-26 2010-08-18 上海全土豆网络科技有限公司 视频指纹快速检索方法
CN101847158A (zh) * 2009-03-24 2010-09-29 索尼株式会社 基于上下文的视频查找器
CN102024033A (zh) * 2010-12-01 2011-04-20 北京邮电大学 一种自动检测音频模板并对视频分章的方法
CN103021440A (zh) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 一种音频流媒体的跟踪方法及系统

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1708758A (zh) * 2002-11-01 2005-12-14 皇家飞利浦电子股份有限公司 改进的音频数据指纹搜索
WO2006012241A2 (en) * 2004-06-24 2006-02-02 Landmark Digital Services Llc Method of characterizing the overlap of two media segments
EP2168061A1 (en) * 2007-06-06 2010-03-31 Dolby Laboratories Licensing Corporation Improving audio/video fingerprint search accuracy using multiple search combining
AU2007214319A1 (en) * 2007-08-30 2009-03-19 Canon Kabushiki Kaisha Improvements for Spatial Wyner Ziv Coding
US8831760B2 (en) * 2009-10-01 2014-09-09 (CRIM) Centre de Recherche Informatique de Montreal Content based audio copy detection
CN102237084A (zh) * 2010-04-22 2011-11-09 松下电器产业株式会社 声音空间基准模型的在线自适应调节方法及装置和设备
CN102622353B (zh) * 2011-01-27 2013-10-16 天脉聚源(北京)传媒科技有限公司 一种固定音频检索方法
CN102193995B (zh) * 2011-04-26 2014-05-28 深圳市迅雷网络技术有限公司 一种建立多媒体数据索引、检索的方法及装置
ES2459391T3 (es) * 2011-06-06 2014-05-09 Bridge Mediatech, S.L. Método y sistema para conseguir hashing de audio invariante al canal
US9418669B2 (en) * 2012-05-13 2016-08-16 Harry E. Emerson, III Discovery of music artist and title for syndicated content played by radio stations
US20150193199A1 (en) * 2014-01-07 2015-07-09 Qualcomm Incorporated Tracking music in audio stream

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159834A (zh) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 一种重复性视频音频节目片段的检测方法和系统
CN101847158A (zh) * 2009-03-24 2010-09-29 索尼株式会社 基于上下文的视频查找器
CN101615417A (zh) * 2009-07-24 2009-12-30 北京海尔集成电路设计有限公司 一种精确到字的中文同步显示歌词方法
CN101651694A (zh) * 2009-09-18 2010-02-17 北京亮点时间科技有限公司 提供音频相关信息的方法、系统、客户端及服务器
CN101807208A (zh) * 2010-03-26 2010-08-18 上海全土豆网络科技有限公司 视频指纹快速检索方法
CN102024033A (zh) * 2010-12-01 2011-04-20 北京邮电大学 一种自动检测音频模板并对视频分章的方法
CN103021440A (zh) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 一种音频流媒体的跟踪方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105307054A (zh) * 2015-10-28 2016-02-03 成都三零凯天通信实业有限公司 一种地面数字电视防插播方法
CN105307054B (zh) * 2015-10-28 2018-06-08 成都三零凯天通信实业有限公司 一种地面数字电视防插播方法

Also Published As

Publication number Publication date
US20150286464A1 (en) 2015-10-08
US9612791B2 (en) 2017-04-04
CN103021440B (zh) 2015-04-22
CN103021440A (zh) 2013-04-03

Similar Documents

Publication Publication Date Title
WO2014079322A1 (zh) 音频流媒体的跟踪方法及系统、存储介质
RU2422891C2 (ru) Система и способ для ускорения поисков в базе данных для множественных синхронизированных потоков данных
US20210193167A1 (en) Audio recognition method, device and server
US10540993B2 (en) Audio fingerprinting based on audio energy characteristics
WO2020238209A1 (zh) 音频处理的方法、系统及相关设备
CN111460153B (zh) 热点话题提取方法、装置、终端设备及存储介质
US20140161263A1 (en) Facilitating recognition of real-time content
JP6901798B2 (ja) オーディオエネルギー特性に基づくオーディオフィンガープリンティング
WO2023169258A1 (zh) 音频检测方法、装置、存储介质及电子设备
CN113596579B (zh) 视频生成方法、装置、介质及电子设备
CN104091596A (zh) 一种乐曲识别方法、系统和装置
WO2019196238A1 (zh) 一种语音识别方法、终端设备及计算机可读存储介质
CN111326146A (zh) 语音唤醒模板的获取方法、装置、电子设备及计算机可读存储介质
WO2018149081A1 (zh) 回访语音信息的处理方法、装置、终端和存储介质
WO2021103594A1 (zh) 一种默契度检测方法、设备、服务器及可读存储介质
CN106782612B (zh) 一种逆向爆音检测方法及其装置
CN111210817A (zh) 数据处理方法及装置
WO2022194277A1 (zh) 音频指纹的处理方法、装置、计算机设备和存储介质
CN114595361A (zh) 一种音乐热度的预测方法、装置、存储介质及电子设备
CN110400578B (zh) 哈希码的生成及其匹配方法、装置、电子设备和存储介质
Qian et al. A novel algorithm for audio information retrieval based on audio fingerprint
Herley Accurate repeat finding and object skipping using fingerprints
CN114756901B (zh) 操作性风险监控方法及装置
TW202219816A (zh) 新聞事件主題自動生成方法、裝置、電子設備及存儲介質
CN117807564A (zh) 音频数据的侵权识别方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13856928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.10.2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13856928

Country of ref document: EP

Kind code of ref document: A1