CN108205550A

CN108205550A - The generation method and device of audio-frequency fingerprint

Info

Publication number: CN108205550A
Application number: CN201611173755.7A
Authority: CN
Inventors: 吴岩
Original assignee: Beijing Kuwo Technology Co Ltd
Current assignee: Beijing Kuwo Technology Co Ltd
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-06-26
Anticipated expiration: 2036-12-16
Also published as: CN108205550B

Abstract

The present embodiments relate to the generation methods and device of a kind of audio-frequency fingerprint.Including：The second audio file based on pcm encoder, audio file of second audio file for the interception at the first audio file first time are intercepted according to the first audio file；Multiple sub fingerprints are obtained according to second audio file；Audio-frequency fingerprint of the setting quantity sub fingerprint as first audio file in the multiple sub fingerprint is intercepted at the second time.Audio-frequency fingerprint of various formatted audio files a string of the identifiers extracting and be calculated as audio file can be directed to, song is identified with this, even if it is also that will not change to change information, the audio-frequency fingerprints such as singer's name of song, album name.

Description

The generation method and device of audio-frequency fingerprint

Technical field

The present invention relates to audio data processing technology field more particularly to the generation methods and device of a kind of audio-frequency fingerprint.

Background technology

Audio file is generally comprised to store the identification informations such as singer, title, album name, age and style Data segment, for example, for the audio file of MP3 format, the storage mark letter generally in the ID3 information of the MP3 audio files Breath.When playing audio file, usually by reading the identification information being stored in the data segment of identification information, which is believed Breath is shown in broadcast interface, is supplied to user.

But being constantly progressive with technology, it, can be easily in audio file in order to evade copyright and other reasons The data segment of storage identification information is modified or is deleted.For this kind of audio file, when playing out, it will nothing occur Method correctly identifies the situation of song, this will certainly influence the appreciation experience of song.

Invention content

An embodiment of the present invention provides the generation methods and device of a kind of audio-frequency fingerprint.By extracting taking in audio file Audio-frequency fingerprint of a string of the identifiers for going out and being calculated as audio file identifies song with this, can change in ID3 information etc. After change, song still can not can be correctly identified.

On the one hand, an embodiment of the present invention provides a kind of generation method of audio-frequency fingerprint, including：

It is encoded according to the interception of the first audio file based on pulse code modulation (Pulse Code Modulation, PCM) Second audio file, audio file of second audio file for the interception at the first audio file first time；

Multiple sub fingerprints are obtained according to second audio file；

The setting quantity sub fingerprint in the multiple sub fingerprint is intercepted at the second time as first sound The audio-frequency fingerprint of frequency file.

Optionally, it further includes：

It determines source audio file, the source audio file is converted into first audio file.

Optionally, the first time is 45 seconds.

Optionally, second time is more than 32 seconds, and less than the first time.

Optionally, the quantity that sets is 512.

On the other hand, an embodiment of the present invention provides a kind of methods that audio-frequency fingerprint is added in audio file data library. The audio file data library includes multiple audio files, the method includes：

Determine at least one audio file for not including audio-frequency fingerprint in the multiple audio file；

Calculate each corresponding multiple sub fingerprints at least one audio file；

It generates at least one audio file and refers to more than the audio-frequency fingerprint of the audio file of first time, the audio Line is the setting quantity sub fingerprint that intercepts at the first time more than the audio file of first time；

Database statement is generated, and by audio-frequency fingerprint addition in the database according to the audio-frequency fingerprint.

Another aspect, an embodiment of the present invention provides a kind of generating means of audio-frequency fingerprint.Including：

Interception unit, for intercepting the second audio file based on pcm encoder, second sound according to the first audio file Audio file of the frequency file for the interception at the first audio file first time；

Sub fingerprint generation unit, for obtaining multiple sub fingerprints according to second audio file；

Audio-frequency fingerprint generation unit, for intercepting the setting quantity in the multiple sub fingerprint at the second time Audio-frequency fingerprint of the sub fingerprint as first audio file.

Optionally, it further includes：

The source audio file for determining source audio file, is converted to first audio file by determination unit.

Optionally, second time is more than 32 seconds, and less than the first time.

In another aspect, an embodiment of the present invention provides a kind of devices that audio-frequency fingerprint is added in audio file data library. The audio file data library includes multiple audio files, and described device includes：

Determination unit, for determining not include at least one audio file of audio-frequency fingerprint in the multiple audio file；

Sub fingerprint generation unit, for calculating each corresponding multiple sub fingerprints at least one audio file；

Audio-frequency fingerprint generation unit, for generating the audio file for being more than first time at least one audio file Audio-frequency fingerprint, the audio-frequency fingerprint is the setting number that intercepts at the first time more than the audio file of first time Amount sub fingerprint；

Adding device for generating database statement according to the audio-frequency fingerprint, and the audio-frequency fingerprint is added in institute It states in database.

Through the embodiment of the present invention, it can be extracted for various formatted audio files and a string of identifier conducts are calculated The audio-frequency fingerprint of audio file identifies song with this, even if the information such as the singer name of change song, album name, audio-frequency fingerprint And it will not change.

Description of the drawings

Fig. 1 is a kind of flow chart of the generation method of audio-frequency fingerprint provided in an embodiment of the present invention；

Fig. 2 is a kind of method flow that audio-frequency fingerprint is added in audio file data library provided in an embodiment of the present invention Figure；

Fig. 3 is a kind of generating means structure diagram of audio-frequency fingerprint provided in an embodiment of the present invention；

Fig. 4 is that a kind of apparatus structure that audio-frequency fingerprint is added in audio file data library provided in an embodiment of the present invention shows It is intended to.

Specific embodiment

Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.

The embodiment of the present invention is converted into reference format by the audio file to different arbitrary versions, according to the reticle The audio file of formula goes to extract, so being not in since standard for manual sampling caused by the multi version problem of audio file is skimble-scamble Problem, meanwhile, during fingerprint is calculated, by down-sampled, Fourier transformation mode has sampled song portions fingerprint, both The uniqueness of the fingerprint generation of various audio files is met, also identifies that the fingerprint efficiency of the audio file provides base to improve Plinth.

For ease of the understanding to the embodiment of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment Bright, embodiment does not form the restriction to the embodiment of the present invention.

Fig. 1 is a kind of flow chart of the generation method of audio-frequency fingerprint provided in an embodiment of the present invention.As shown in Figure 1, the party Method specifically includes：

S110, according to the first audio file intercept the second audio file based on pcm encoder, second audio file be The audio file intercepted at first audio file first time.

First audio file is the audio file of reference format, and the form of first audio file can be that WMA etc. is general Audio file form.

Source audio file, that is, need the audio file being identified using audio-frequency fingerprint, it is understood that there may be multiple versions Source audio file is converted to the audio file of reference format by multiple format first so that when audio-frequency fingerprint generates, the system of sampling One, improve the accuracy of audio-frequency fingerprint.

When according to the first audio file generation audio-frequency fingerprint, a part for the audio file, a part of sound can be intercepted Frequency fingerprint is the data fingerprint for being regarded as the audio source file.The part is the audio file based on pcm encoder.

Specifically, first audio file is played using Mplayer, first audio file was intercepted at 45 seconds from the beginning of The second audio file being played at this 45 seconds, second audio file are the WAV audio format files based on pcm encoder, with Analog signal ratio, it is not easy to be influenced by the clutter of conveyer system and distortion, and wide dynamic range can obtain the fairly good shadow of sound quality Ring effect.It should be appreciated that the duration of the second audio file is longer, recognition accuracy is higher, and 45 seconds are only the one of the present invention A example is not formed and is limited.

S120 obtains multiple sub fingerprints according to the second audio file.

Wherein, the generating process of multiple sub fingerprints is described in detail below：

The second audio file is carried out for various sound channels and sample rate down-sampled.It is handled by Hanning window, eliminates high frequency Interference and leakage energy, carry out Fourier transformation.Energy is calculated in frequency domain by calculating frequency domain amplitude and each frequency band.It calculates Go out energy differences, difference WRT is more than to 0 typing fingerprint, obtains sub fingerprint.

It is actually also frequency information that audio, which is realized, each sampled point record is amplitude of the waveform in the point, for For one audio file, he is characterized on frequency information.

In one example, the generation of sub fingerprint specifically comprises the following steps：

1st, a frame audio-frequency information of the second down-sampled audio file is passed through in extraction.

2nd, it is handled by Hanning window, eliminates High-frequency Interference and leakage energy, carry out Fourier transformation.

3rd, according to the second audio file after Fourier transformation, amplitude information is changed into energy information.

4th, the result of energy information is taken absolute value.

5th, frequency is mapped to 9 frequency bands, calculates each frequency band energy in 300---2000.

According to 300---2000HZ frequency bark values, be divided into 9 frequency band, calculate each frequency band energy and.

6. the generation sub fingerprint compared with previous frame energy value.

We obtain 9 energy informations, E [1....9], E_ [i]=[i+1]-E [i]；F [n, M] represents n-th frame, E_'s [M] Value.

F if [n, M]-F [n-1, M]>0 sub fingerprint M is 1, is otherwise 0, in this way can be according to the comparison of two frames Generate the sub fingerprint of 8 bytes.

S130, at the second time intercept setting quantity sub fingerprint in multiple sub fingerprints as the first audio text The audio-frequency fingerprint of part.

It can determine that the second audio file is corresponding with multiple sub fingerprints according to aforementioned S110, S120, multiple son can be intercepted and referred to A part for line, the combination of a part of sub fingerprint are the audio-frequency fingerprint of the first audio file or source audio file.

Specifically, it may be determined that it is corresponding more to intercept second audio file since at the second time for the second audio file The sub fingerprint of quantity is set in a sub fingerprint as audio-frequency fingerprint.Wherein, when which may be greater than 32 seconds less than first Between random time, such as at the first time for 45 seconds, the second time can be 32 seconds or 35 seconds etc., can avoid audio text in this way The prelude of part enhances different song fingerprints othernesses.It can be 512 sub fingerprints (corresponding son of general 6 seconds audios to set quantity Fingerprint).

The data line data example of generation：5939cd89,5d39dd8b, 5d39dda3 ... ... (omit 508 sons to refer to Line), a96a76ab.

It should be noted that the initial value of multiple sub fingerprints of the second audio file of interception is the second time, this second when Between for 32 seconds be only an example provided in an embodiment of the present invention, form limit.

It should also be noted that, the corresponding fingerprint of 6 seconds audios of interception is only an example provided in an embodiment of the present invention, and Restriction is not formed.The bigger the time span for calculating fingerprint the more accurate, and the smaller efficiency of time span is higher.6 seconds fingerprints are only calculated to know It is not efficient, and recognition effect can reach 95%.

Through the embodiment of the present invention, the extraction of various formatted audio files can be directed to and a string of identifiers is calculated as sound The audio-frequency fingerprint of frequency file, a string of character strings are corresponding with audio file, and the probability for identical audio-frequency fingerprint occur is very small, Song is identified with this, even if information, the audio-frequency fingerprint such as singer's name of change song, album name are also that will not change.

Fig. 2 is a kind of method flow that audio-frequency fingerprint is added in audio file data library provided in an embodiment of the present invention Figure.As shown in Fig. 2, audio file data library includes multiple audio files, this method specifically includes：

S210 determines at least one audio file for not including audio-frequency fingerprint in multiple audio files.

Audio file data library generally comprises multiple audio files, which a part of may possess audio and refer to Line, a part do not have.It can be examined in, determine whether each audio file has been computed audio-frequency fingerprint, will not count The audio file of calculation adds in miss and (misses) list.

The miss lists generally comprise at least one audio file, which does not all calculate audio and refer to Line.

S220 calculates each corresponding multiple sub fingerprints at least one audio file.

Audio-frequency fingerprint is calculated respectively at least one audio file that miss lists include.

First, the corresponding multiple sub fingerprints of each audio file in miss lists are calculated, the calculation of the sub fingerprint can Referring to the description in S120 in aforementioned embodiment shown in FIG. 1, repeat no more.

S230 is generated at least one audio file and is referred to more than the audio-frequency fingerprint of the audio file of first time, the audio Line is the setting quantity sub fingerprint that intercepts at the first time more than the audio file of first time.

Wherein, the generation of audio-frequency fingerprint can be found in the description in embodiment shown in FIG. 1 in S130.

When in embodiments of the present invention, due to generation audio-frequency fingerprint, need to intercept sub fingerprint since at the first time, for Audio file in miss lists may include the audio file that a part is less than first time length, further include a part of big In the audio file of first time length.Wherein, the second time in aforementioned embodiment illustrated in fig. 1, example be can be found at the first time It such as can be 32 seconds.

It needs to calculate audio-frequency fingerprint to the audio file for being more than first time length.

For being less than the audio file of first time length when calculating audio-frequency fingerprint, it may appear that the situation of failure is calculated, The mark of all audio files for calculating failure of merger.

S240 generates database statement, and the audio-frequency fingerprint is added in the database according to the audio-frequency fingerprint In.

For properly generating the audio file of audio-frequency fingerprint, the audio file is identified using the audio-frequency fingerprint, according to the sound Frequency fingerprint creation MYSQL sentences, the operations such as to be inquired the audio file, deleted according to the MYSQL sentences.By the sound Frequency fingerprint is according to its correspondence with audio file, and addition is in the database.

Song fingerprints can be added, and count addition successfully and do not add into each audio file in database with this The song files of work(.

Fig. 3 is a kind of generating means structure diagram of audio-frequency fingerprint provided in an embodiment of the present invention.It as shown in figure 3, should Device includes：

Interception unit 301, for intercepting the second audio file based on pcm encoder according to the first audio file, described the Audio file of two audio files for the interception at the first audio file first time；

Sub fingerprint generation unit 302, for obtaining multiple sub fingerprints according to second audio file；

Audio-frequency fingerprint generation unit 303, for intercepting the setting number in the multiple sub fingerprint at the second time Audio-frequency fingerprint of the amount sub fingerprint as first audio file.

Optionally, it further includes：

Optionally, the first time is 45 seconds.

Optionally, second time is more than 32 seconds, and less than the first time.

Optionally, the quantity that sets is 512.

Fig. 4 is that a kind of apparatus structure that audio-frequency fingerprint is added in audio file data library provided in an embodiment of the present invention shows It is intended to.The audio file data library includes multiple audio files, and as described in Figure 4, which includes：

Determination unit 401, for determining at least one audio text for not including audio-frequency fingerprint in the multiple audio file Part；

Sub fingerprint generation unit 402, for calculating each corresponding multiple sub fingerprints at least one audio file；

Audio-frequency fingerprint generation unit 403, for generating the audio for being more than first time at least one audio file The audio-frequency fingerprint of file, the audio-frequency fingerprint are set for what is intercepted at the first time more than the audio file of first time Fixed number amount sub fingerprint；

Adding device 404 for generating database statement according to the audio-frequency fingerprint, and audio-frequency fingerprint addition is existed In the database.

Professional should further appreciate that, be described with reference to the embodiments described herein each exemplary Unit and algorithm steps can be realized with the combination of electronic hardware, computer software or the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are performed actually with hardware or software mode, specific application and design constraint depending on technical solution. Professional technician can realize described function to each specific application using distinct methods, but this realization It is it is not considered that beyond the scope of this invention.

The step of method or algorithm for being described with reference to the embodiments described herein, can use hardware, processor to perform The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.

Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further It is described in detail, it should be understood that the foregoing is merely the specific embodiment of the present invention, is not intended to limit the present invention Protection domain, all any modification, equivalent substitution, improvement and etc. within the scope of the present invention, done should be included in this hair Within bright protection domain.

Claims

1. a kind of generation method of audio-frequency fingerprint, which is characterized in that including：

The second audio file based on pcm encoder is intercepted according to the first audio file, second audio file is described the The audio file intercepted at one audio file first time；

Multiple sub fingerprints are obtained according to second audio file；

The setting quantity sub fingerprint in the multiple sub fingerprint is intercepted at the second time as first audio text The audio-frequency fingerprint of part.

2. it according to the method described in claim 1, it is characterized in that, further includes：

3. method according to claim 1 or 2, which is characterized in that the first time is 45 seconds.

4. method according to claim 1 or 2, which is characterized in that second time is more than 32 seconds, and less than described the One time.

5. according to the method described in claim 4, it is characterized in that, the quantity that sets is 512.

A kind of 6. method that audio-frequency fingerprint is added in audio file data library, which is characterized in that the audio file data library Including multiple audio files, the method includes：

It generates at least one audio file and is more than the audio-frequency fingerprint of the audio file of first time, the audio-frequency fingerprint The setting quantity sub fingerprint intercepted at the first time more than the audio file of first time；

7. a kind of generating means of audio-frequency fingerprint, which is characterized in that including：

Interception unit, for intercepting the second audio file based on pcm encoder, the second audio text according to the first audio file Audio file of the part for the interception at the first audio file first time；

Audio-frequency fingerprint generation unit refers to for intercepting the setting quantity height in the multiple sub fingerprint at the second time Audio-frequency fingerprint of the line as first audio file.

8. device according to claim 7, which is characterized in that further include：

9. device according to claim 7 or 8, which is characterized in that second time is more than 32 seconds, and less than described the One time.

A kind of 10. device that audio-frequency fingerprint is added in audio file data library, which is characterized in that the audio file data library Including multiple audio files, described device includes：

Audio-frequency fingerprint generation unit, for generating at least one audio file more than the sound of the audio file of first time Frequency fingerprint, the audio-frequency fingerprint are the setting quantity that intercepts at the first time more than the audio file of first time Sub fingerprint；

Adding device for generating database statement according to the audio-frequency fingerprint, and the audio-frequency fingerprint is added in the number According in library.