CN110047515A - A kind of audio identification methods, device, equipment and storage medium - Google Patents
A kind of audio identification methods, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110047515A CN110047515A CN201910270746.7A CN201910270746A CN110047515A CN 110047515 A CN110047515 A CN 110047515A CN 201910270746 A CN201910270746 A CN 201910270746A CN 110047515 A CN110047515 A CN 110047515A
- Authority
- CN
- China
- Prior art keywords
- fingerprint
- audio
- candidate
- unisonance
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000012141 concentrate Substances 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 abstract description 8
- 101150060512 SPATA6 gene Proteins 0.000 description 65
- 230000006870 function Effects 0.000 description 22
- 238000000605 extraction Methods 0.000 description 21
- 230000005236 sound signal Effects 0.000 description 18
- 238000012216 screening Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 9
- 238000009432 framing Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 8
- 241000208340 Araliaceae Species 0.000 description 6
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 6
- 235000003140 Panax quinquefolius Nutrition 0.000 description 6
- 235000008434 ginseng Nutrition 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Collating Specific Patterns (AREA)
Abstract
The embodiment of the invention discloses a kind of audio identification methods, device, equipment and storage mediums;The embodiment of the present invention can extract the audio-frequency fingerprint of audio to be identified as reference finger, the similarity of calculating benchmark fingerprint and preset fingerprint library sound intermediate frequency fingerprint;According to the similarity of reference finger and fingerprint base sound intermediate frequency fingerprint, candidate fingerprint collection is filtered out in fingerprint base;Reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of reference fingerprint;In reference fingerprint and its corresponding audio of unisonance fingerprint, the corresponding target audio of audio to be identified is selected.The program improves the fining degree of audio identification, and identification obtains more accurate target audio.
Description
Technical field
The present invention relates to fields of communication technology, and in particular to a kind of audio identification methods, device, equipment and storage medium.
Background technique
It listens song to know the music that Qu Gongneng likes for the music public's retrieval and provides a kind of very convenient way of search,
User need to only record the music in environment, or humming snatch of song, and input application software can identify which first song this is
It is bent.Current to listen song to know bent, is mainly retrieved in the song library of magnanimity according to the characteristic information of input song, select with it is defeated
Enter the most like song of song.
In the research and practice process to the prior art, the inventors found that: the audio fragment that user uploads
It may be corresponding with the audio of multiple versions, and current music platform audio identification process is coarse, does not consider different editions
Between difference, the segment for causing music platform to provide according to user come the song selected may not be that audio fragment is real
Source is not that user really wants.As can be seen that current audio identification accuracy is poor.
Summary of the invention
The embodiment of the present invention provides a kind of audio identification methods, device, equipment and storage medium, it is intended to improve audio identification
Accuracy.
The embodiment of the present invention provides a kind of audio identification methods, comprising:
The audio-frequency fingerprint of audio to be identified is extracted as reference finger, calculates the reference finger and preset fingerprint library middle pitch
The similarity of frequency fingerprint;
According to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint, candidate is filtered out in the fingerprint base and is referred to
Line collection;
Reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;
In the reference fingerprint and its corresponding audio of unisonance fingerprint, the corresponding target sound of the audio to be identified is selected
Frequently.
In some embodiments, the unisonance fingerprint for obtaining the reference fingerprint, comprising:
It calculates the reference fingerprint and candidate fingerprint concentrates the registration of other candidate fingerprints;
According to the registration, the unisonance fingerprint of the reference fingerprint is selected in other described candidate fingerprints.
In some embodiments, the calculating reference fingerprint concentrates being overlapped for other candidate fingerprints with candidate fingerprint
Degree, comprising:
It obtains the reference fingerprint and candidate fingerprint concentrates the longest common subsequence of other candidate fingerprints, statistics is described most
The length of long common subsequence;
According to the length of the longest common subsequence, being overlapped for the reference fingerprint and other candidate fingerprints is calculated
Degree.
In some embodiments, described according to the registration, the reference is selected in other described candidate fingerprints to be referred to
The unisonance fingerprint of line, comprising:
In other described candidate fingerprints, filters out and be greater than or equal to preset threshold with the registration of the reference fingerprint
Candidate fingerprint, the unisonance fingerprint as the reference fingerprint.
In some embodiments, the method also includes:
If the candidate fingerprint for being greater than or equal to preset threshold with the registration of the reference fingerprint is not found, by the ginseng
It examines the corresponding audio of fingerprint and is determined as the corresponding target audio of the audio to be identified.
In some embodiments, reference fingerprint is selected in candidate fingerprint concentration, comprising:
The candidate fingerprint is concentrated, the maximum candidate fingerprint of similarity numerical value with the reference finger is determined as joining
Examine fingerprint.
In some embodiments, the similarity for calculating the reference finger and preset fingerprint library sound intermediate frequency fingerprint, packet
It includes:
The quantity for the identical cryptographic Hash that the reference finger and each audio-frequency fingerprint in preset fingerprint library are included is counted respectively;
According to the quantity of the identical cryptographic Hash, the phase of the reference finger with audio-frequency fingerprint each in fingerprint base is calculated separately
Like degree.
In some embodiments, described in the reference fingerprint and its corresponding audio of unisonance fingerprint, select it is described to
Identify the corresponding target audio of audio, comprising:
It obtains the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance audio, obtain the version letter of unisonance audio
Breath;
According to the version information, the version priority of the unisonance audio is determined;
Using the unisonance audio of version highest priority as the corresponding target audio of the audio to be identified.
In addition, the embodiment of the present invention also provides a kind of speech recognizing device, comprising:
Fingerprint unit, for extracting the audio-frequency fingerprint of audio to be identified as reference finger, calculate the reference finger with
The similarity of preset fingerprint library sound intermediate frequency fingerprint;
Candidate unit, for the similarity according to the reference finger and fingerprint base sound intermediate frequency fingerprint, in the fingerprint base
In filter out candidate fingerprint collection;
Unisonance unit for selecting reference fingerprint in candidate fingerprint concentration, and obtains the unisonance of the reference fingerprint
Fingerprint;
Audio unit, for selecting the sound to be identified in the reference fingerprint and its corresponding audio of unisonance fingerprint
Frequently corresponding target audio.
In addition, the embodiment of the present invention also provides a kind of audio recognition devices, the audio recognition devices include: memory,
Processor and the audio identification program that is stored on the memory, and can run on the processor, the audio identification
It realizes when program is executed by the processor such as the step in any audio identification methods provided in an embodiment of the present invention.
In some embodiments, the audio recognition devices further include audio collecting device, and the audio collecting device is used
In acquisition audio to be identified.
In addition, the embodiment of the present invention also provides a kind of storage medium, the storage medium is stored with a plurality of instruction, the finger
It enables and being loaded suitable for processor, to execute the step in any audio identification methods provided in an embodiment of the present invention.
Audio-frequency fingerprint of the embodiment of the present invention by extracting audio to be identified calculates the reference finger as reference finger
With the similarity of preset fingerprint library sound intermediate frequency fingerprint;According to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint,
Candidate fingerprint collection is filtered out in the fingerprint base;Reference fingerprint is selected in candidate fingerprint concentration, and obtains the reference and refers to
The unisonance fingerprint of line;In the reference fingerprint and its corresponding audio of unisonance fingerprint, it is corresponding to select the audio to be identified
Target audio.As a result, the program retrieve with after the approximate candidate fingerprint of reference finger, although candidate fingerprint is referred to benchmark
Line is matched, but it may lead to the presence of uncertainty because of the version problem etc. of audio to be identified.Therefore, the program into
One step selects reference fingerprint in candidate fingerprint concentration, and then passes through other candidate fingerprints of calculating in candidate fingerprint collection of registration
In select unisonance fingerprint, realize the further screening to candidate fingerprint.The program is by repeatedly screening obtained reference fingerprint
And its unisonance fingerprint, it include closest with the reference finger of audio to be identified, and corresponding audio is identical or can be considered identical
Audio-frequency fingerprint.To which the target audio selected in reference fingerprint and its corresponding audio of unisonance fingerprint is the sound of optimal version
Frequently, the real source or source of audio to be identified be can be used as, while having ensured the accuracy of target audio content and version, improved
The whole efficiency and user experience of audio identification.The program by screening the audio-frequency fingerprint in fingerprint base layer by layer, carefully
Audio identification granularity is changed, has improved the fining degree of audio identification, so that retrieval obtains more accurate target audio.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 a is the schematic diagram of a scenario of information interaction system provided in an embodiment of the present invention;
Fig. 1 b is the flow diagram of audio identification methods provided in an embodiment of the present invention;
Fig. 2 a is audio identification schematic diagram of a scenario provided in an embodiment of the present invention;
Fig. 2 b is candidate fingerprint collection schematic diagram provided in an embodiment of the present invention;
Fig. 2 c is recognition result display interface schematic diagram provided in an embodiment of the present invention;
Fig. 3 is speech recognizing device structural schematic diagram provided in an embodiment of the present invention;
Fig. 4 a is audio recognition devices structural schematic diagram provided in an embodiment of the present invention;
Fig. 4 b is another audio recognition devices structural schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of audio identification methods, device, equipment and storage medium.
As shown in Figure 1a, the embodiment of the present invention provides a kind of information interaction system, which includes that the embodiment of the present invention is appointed
One speech recognizing device provided, the speech recognizing device can integrate in the equipment such as server;In addition, the system can be with
Including other equipment, for example, client etc..Client can be terminal or personal computer (PC, Personl Computer)
Etc. equipment, for acquiring audio to be identified and/or uploading audio to be identified to server.
Client will record or local audio is as audio to be identified, be sent to server, request carries out audio identification.Clothes
The audio to be identified that device reception client of being engaged in is sent, extracts the audio-frequency fingerprint of audio to be identified as reference finger, then calculates
The similarity of the reference finger and preset fingerprint library sound intermediate frequency fingerprint;To according to the reference finger and fingerprint base sound intermediate frequency
The similarity of fingerprint filters out candidate fingerprint collection in the fingerprint base;Then, it selects to refer in candidate fingerprint concentration and refer to
Line, and obtain the unisonance fingerprint of the reference fingerprint;In the reference fingerprint and its corresponding audio of unisonance fingerprint, institute is selected
State the corresponding target audio of audio to be identified.
As a result, the program retrieve with after the approximate candidate fingerprint of reference finger, although candidate fingerprint is referred to benchmark
Line is matched, but it may lead to the presence of uncertainty because of the version problem etc. of audio to be identified.Therefore, the program into
One step selects reference fingerprint in candidate fingerprint concentration, and then passes through other candidate fingerprints of calculating in candidate fingerprint collection of registration
In select unisonance fingerprint, realize the further screening to candidate fingerprint.The program is by repeatedly screening obtained reference fingerprint
And its unisonance fingerprint, it include closest with the reference finger of audio to be identified, and corresponding audio is identical or can be considered identical
Audio-frequency fingerprint.To which the target audio selected in reference fingerprint and its corresponding audio of unisonance fingerprint is the sound of optimal version
Frequently, the real source or source of audio to be identified be can be used as, while having ensured the accuracy of target audio content and version, improved
The whole efficiency and user experience of audio identification.The program by screening the audio-frequency fingerprint in fingerprint base layer by layer, carefully
Audio identification granularity is changed, has improved the fining degree of audio identification, so that retrieval obtains more accurate target audio.
It is described in detail separately below.
The present embodiment will be described from the angle of speech recognizing device, which specifically can integrate in net
In network equipment, which can be the equipment such as terminal or server, wherein the terminal can be mobile phone, tablet computer, pen
Remember this computer or personal computer (PC, Personal Computer) etc..
As shown in Figure 1 b, the detailed process of the audio identification methods can be such that
101, the audio-frequency fingerprint for obtaining audio to be identified is benchmark fingerprint, is calculated in the reference finger and preset fingerprint library
The similarity of audio-frequency fingerprint.
Wherein, the audio-frequency fingerprint and audio-frequency fingerprint and audio of each audio in audio repository are stored in preset fingerprint base
The mapping relations of each audio in library.For example, speech recognizing device can carry out audio-frequency fingerprint to each audio in audio repository in advance
Extraction, extraction is obtained into the storage of each audio-frequency fingerprint into fingerprint base, and record the mapping relations of each audio and audio-frequency fingerprint.
For example, speech recognizing device obtains audio to be identified, the extraction of audio-frequency fingerprint is carried out, and by the sound of audio to be identified
Frequency fingerprint is as reference finger, for inquiring and its closest or most like audio-frequency fingerprint.
In some embodiments, image retrieving apparatus can receive audio identification request, obtain audio to be identified;To described
Audio to be identified carries out audio-frequency fingerprint extraction, Hash sequence is obtained, using the Hash sequence as reference finger.
For example, the identification request of client input audio can be used in user, speech recognizing device is asked receiving audio identification
After asking, notice client starts to carry out audio collection, to record to the sound etc. in the humming sound or environment of user
Sound obtains audio to be identified, which is that this audio identification requests corresponding audio to be identified.Certainly, user
Client can also be locally stored, or the audio downloaded from network is uploaded to speech recognizing device, audio is known as a result,
Other device obtains audio identification request and its corresponding audio to be identified.
Wherein, client can be sound pick-up outfit or mobile phone, plate, personal computer with audio collection function etc. eventually
End equipment.
Then, speech recognizing device carries out audio-frequency fingerprint extraction to the audio signal of audio to be identified, obtains sound to be identified
The audio-frequency fingerprint of frequency, the audio-frequency fingerprint contain the audio feature information of audio to be identified.Wherein, the audio of audio signal is referred to
Line extraction, which can specifically include, carries out framing, adding window, FFT (Fast Fourier Transform, in quick Fu to audio signal
Leaf transformation) frequency-domain transform, extraction local peaking and conversion Hash sequence etc..
Specifically, speech recognizing device carries out framing after obtaining audio to be identified, to the audio signal of audio to be identified
And windowing process.Framing is that whole section audio signal is cut into multistage by preset rules, and each section is a frame, so that audio signal
It is smoothly, so as to the Audio Signal Processing smoothing input signal for the later period on microcosmic.Then, speech recognizing device uses
Preset windowed function carries out adding window to every frame audio respectively, and preset windowed function can be Hamming window etc., to make framing
Audio signal afterwards is more coherent, shows periodic function feature.
Then, speech recognizing device carries out FFT frequency-domain transform to each frame audio signal, obtains the frequency comprising frequency domain information
Spectrum.In turn, speech recognizing device extracts the local peaking in frequency spectrum, and it is as to be identified to be converted into the Hash sequence Hash sequence
The audio-frequency fingerprint of audio.It should be noted that may include multiple cryptographic Hash in the Hash sequence.
Speech recognizing device comes calculating benchmark fingerprint and default finger using the audio-frequency fingerprint of audio to be identified as reference finger
The similarity of line library sound intermediate frequency fingerprint realizes the retrieval or matching of audio-frequency fingerprint.
In some embodiments, the audio-frequency fingerprint in reference finger and fingerprint base is characterized using Hash sequence, step " meter
Calculate the similarity of the reference finger Yu preset fingerprint library sound intermediate frequency fingerprint " may include: count respectively the reference finger with
The quantity for the identical cryptographic Hash that each audio-frequency fingerprint is included in preset fingerprint library;According to the quantity of the identical cryptographic Hash, respectively
Calculate the similarity of each audio-frequency fingerprint in the reference finger and fingerprint base.
By taking audio-frequency fingerprint any in fingerprint base as an example, speech recognizing device by reference finger Hash sequence cryptographic Hash with
Cryptographic Hash in the audio-frequency fingerprint Hash sequence is compared one by one, and counts the quantity quantity of identical cryptographic Hash, audio identification
Device is using the quantity of obtained identical cryptographic Hash as the similarity of reference finger and the audio-frequency fingerprint.Audio identification fills as a result,
Set the similarity for calculating separately to obtain each audio-frequency fingerprint in reference finger and fingerprint base.
102, according to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint, time is filtered out in the fingerprint base
Select fingerprint collection.
For example, speech recognizing device can be according to preset similarity threshold, it will be similar to reference finger in fingerprint base
Degree value be greater than the similarity threshold audio-frequency fingerprint screen, as with the matched candidate fingerprint of reference finger.
It should be noted that with the matched candidate fingerprint of reference finger, it can be understood as its corresponding audio with it is to be identified
Audio is identical or can be considered identical, such as same song or the different same song of music.
In turn, the candidate fingerprint that screening obtains is configured in identity set by speech recognizing device, obtains candidate fingerprint collection.
Candidate fingerprint concentration includes the one or more and matched candidate fingerprint of reference finger as a result,.
103, reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint.
Wherein, reference fingerprint is the candidate fingerprint most like with reference finger.For example, speech recognizing device can will be described
Candidate fingerprint is concentrated, and the maximum candidate fingerprint of similarity numerical value with the reference finger is determined as reference fingerprint.
Then, speech recognizing device selects the unisonance fingerprint of the reference fingerprint.It should be noted that unisonance fingerprint can be managed
Solution is that its corresponding audio audio corresponding with reference fingerprint is identical or can be considered identical.For example, in the song of music platform
In library, there is number difference but be multiple audios of same song in fact, for example be the different editions of same song, it is different
Singer turns over the different editions sung, or takes in the same song in different albums or radio station, will belong to multiple sounds of same first song
Frequency is defined as unisonance audio, their audio-frequency fingerprint is unisonance fingerprint.
In some embodiments, step " the unisonance fingerprint for obtaining the reference fingerprint " may include: to calculate the reference
Fingerprint and the candidate fingerprint concentrate the registration of other candidate fingerprints;According to the registration, in other described candidate fingerprints
In select the unisonance fingerprint of the reference fingerprint.
Wherein, the registration of reference fingerprint and other candidate fingerprints can pass through the side such as correlation, longest common subsequence
Formula is calculated.Wherein, correlation can be the variance for calculating reference fingerprint and other candidate fingerprint Hash sequences, by variance yields
Registration as reference fingerprint and other candidate fingerprints.Then, variance value is met preset requirement by speech recognizing device
Other candidate fingerprints, the unisonance fingerprint as reference fingerprint.
It is illustrated with longest common subsequence (LCS, Longest Common Subsequence), step " meter
Calculate the registration that the reference fingerprint concentrates other candidate fingerprints with the candidate fingerprint " it may include: to obtain the reference to refer to
Line and candidate fingerprint concentrate the longest common subsequence of other candidate fingerprints, count the length of the longest common subsequence;Root
According to the length of the longest common subsequence, the registration of the reference fingerprint Yu other candidate fingerprints is calculated.
Wherein, reference fingerprint and candidate fingerprint concentrate other candidate fingerprints to characterize using Hash sequence.
As a particular sequence, subsequence refers under conditions of not changing element relative rank Hash sequence, will
The sequence that zero or more element removes in sequence.If a sequence while the subsequence as multiple Hash sequences,
The sequence is the common subsequence of this multiple Hash sequence.And the longest common subsequence of Hash sequence, it is multiple Hash
The longest shared subsequence of sequence.The length of longest common subsequence is the quantity of element in common subsequence.
For example, Dynamic Programming (DP, Dynamic Programming) can be used to calculate reference fingerprint and other candidate fingerprints
The longest common subsequence length of Hash sequence.In the present embodiment, reference fingerprint and other candidate fingerprint Hash sequences are most
Long common subsequence calculating formula of length is as follows:
Nlcs=LCS (res [i] .hash_seq, res [0] .hash_seq)
Wherein, nlcs is longest common subsequence length, and LCS is Dynamic Programming longest common subsequence length computation letter
Number, res [i] hash_seq are i-th of candidate fingerprint Hash sequence, and res [0] .hash_seq is reference fingerprint Hash sequence.
For example, reference fingerprint Hash sequence X={ A, B, C, B, D, A, B }, any other candidate fingerprint Hash sequences Y=
{ B, D, C, A, B, A }.Such as the sequences such as { A, B } and { B, C, B, A }, it is both the subsequence and the subsequence of Y sequence of X sequence,
It therefore, is the common subsequence of X and Y sequence.No longer completely enumerate the common subsequence of X and Y sequence one by one in the present embodiment.
In the common subsequence of X and Y, sequence { B, C, B, A } includes 4 elements, therefore statistics is obtained the length is 4, is X and Y
Longest common subsequence.
By taking other any candidate fingerprints as an example, after obtaining reference fingerprint and its longest common subsequence length, audio
The registration of identification device calculating reference fingerprint and other candidate fingerprints.For example, following formula can be used to calculate:
Sim=nlcs/hash_seq_cnt × 100%;
Wherein, sim is the similarity of reference fingerprint and other candidate fingerprints, and nlcs is longest common subsequence length,
Hash_seq_cnt is reference fingerprint Hash sequence length.In some embodiments, the code of the formula can refer to int sim=
nlcs*1.0/hash_seq_cnt*100。
Speech recognizing device can calculate separately to obtain the registration of reference fingerprint and other each candidate fingerprints as a result,.
Then, speech recognizing device in other candidate fingerprints, can select the unisonance fingerprint of reference fingerprint.
For example, speech recognizing device can be by other maximum candidate fingerprints of registration numerical value, as the same of reference fingerprint
Sound fingerprint;Alternatively, speech recognizing device by registration numerical value according to sequence from large to small, choose sequence in preceding default precedence
Other candidate fingerprints, the unisonance fingerprint as reference fingerprint.
In some embodiments, step " according to the registration, the reference is selected in other described candidate fingerprints and is referred to
The unisonance fingerprint of line " may include: in other described candidate fingerprints, filter out be greater than with the registration of the reference fingerprint or
Unisonance fingerprint equal to the candidate fingerprint of preset threshold, as the reference fingerprint.
Wherein, preset threshold can be adjusted flexibly according to actual needs, such as 25%.
For speech recognizing device in other candidate fingerprints of candidate fingerprint collection, screening obtains the unisonance of reference fingerprint as a result,
Fingerprint.
The present embodiment eliminates by the calculating of similarity and does unisonance audio indicia to the audio in audio repository as a result,
Manpower and time cost, the case where also avoiding manual entry information not in time, and in audio storage, it no longer needs to do unisonance
The artificial additional markers of audio or classification also eliminate the need for the risk of information mistakes and omissions record, reduce maintenance cost.Therefore, originally
Embodiment improves the accuracy and efficiency of unisonance fingerprint and unisonance audio identification.
In some embodiments, if not finding the candidate for being greater than or equal to preset threshold with the registration of the reference fingerprint
The corresponding audio of the reference fingerprint is then determined as the corresponding target audio of the audio to be identified by fingerprint.
Speech recognizing device determines that candidate fingerprint concentration does not have in the unisonance fingerprint that can not find reference fingerprint as a result,
With other very approximate candidate fingerprints of reference fingerprint.Therefore, speech recognizing device is according to audio-frequency fingerprint each in fingerprint base and sound
The mapping relations of frequency determine the corresponding audio of the reference fingerprint, and the audio are determined as the corresponding target sound of audio to be identified
Frequently.
104, in the reference fingerprint and its corresponding audio of unisonance fingerprint, the corresponding mesh of the audio to be identified is selected
Mark with phonetic symbols frequency.
After obtaining reference fingerprint and its unisonance fingerprint, speech recognizing device is according to audio-frequency fingerprint each in fingerprint base and audio
Mapping relations, determine reference fingerprint and its corresponding audio of unisonance fingerprint.
Then, speech recognizing device selects target audio in reference fingerprint and its corresponding audio of unisonance fingerprint.Example
Such as, speech recognizing device is by reference fingerprint and its corresponding audio of unisonance fingerprint, all as the corresponding target of audio to be identified
Audio.In this way, avoiding the audio substantially identical with audio to be identified for causing leakage to be selected due to version problem, audio is improved
The accuracy of fingerprint matching.
In some embodiments, reference fingerprint and its corresponding audio of unisonance fingerprint can also be carried out according to actual needs
Screening, step " in the reference fingerprint and its corresponding audio of unisonance fingerprint, select the corresponding target of the audio to be identified
Audio " may include: to obtain the reference fingerprint and its corresponding audio of unisonance fingerprint as unisonance audio, obtain unisonance audio
Version information;According to the version information, the version priority of the unisonance audio is determined;By the unisonance of version highest priority
Audio is as the corresponding target audio of the audio to be identified.
Wherein, version information includes the information such as source, singer, restocking and/or the issuing date of audio, can be audio certainly
The presupposed information of band.Unisonance audio can be the different audios of version informations such as source difference and/or version.
For example, speech recognizing device sets the version priority that source is album according to the source-information in unisonance audio
It is set to highest, source is that the version priority level initializing in radio station is minimum.Source is the unisonance of album by speech recognizing device as a result,
Audio is determined as target audio.
For example, shelf life of the speech recognizing device according to unisonance audio, according to chronological order, most by shelf life
Early version priority is set as highest, and the version priority of shelf life the latest is set as minimum.Speech recognizing device as a result,
The earliest unisonance audio of shelf life is determined as target audio.
Target audio is most like with audio to be identified as a result, and version most accurate audio.
From the foregoing, it will be observed that the embodiment of the present invention can extract the audio-frequency fingerprint of audio to be identified as reference finger, institute is calculated
State the similarity of reference finger Yu preset fingerprint library sound intermediate frequency fingerprint;According to the reference finger and fingerprint base sound intermediate frequency fingerprint
Similarity filters out candidate fingerprint collection in the fingerprint base, and candidate fingerprint concentration includes referring to the approximate audio of reference finger
Line;Then, reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;In the ginseng
It examines in fingerprint and its corresponding audio of unisonance fingerprint, selects the corresponding target audio of the audio to be identified.The program exists as a result,
Retrieve with after the approximate candidate fingerprint of reference finger, although candidate fingerprint be it is matched with reference finger, it may
Because the version problem etc. of audio to be identified leads to the presence of uncertainty.Therefore, the program is further concentrated in candidate fingerprint and is selected
Reference fingerprint out, and then unisonance fingerprint is selected in other candidate fingerprints of candidate fingerprint collection by the calculating of registration, it realizes
Further screening to candidate fingerprint.The program, which is passed through, repeatedly screens obtained reference fingerprint and its unisonance fingerprint, includes
It is closest with the reference finger of audio to be identified, and corresponding audio is identical or can be considered identical audio-frequency fingerprint.To refer to
The target audio selected in fingerprint and its corresponding audio of unisonance fingerprint is the audio of optimal version, can be used as audio to be identified
Real source or source, while having ensured the accuracy of target audio content and version, improved the whole effect of audio identification
Rate and user experience.The program has refined audio identification granularity, has mentioned by being screened layer by layer to the audio-frequency fingerprint in fingerprint base
The fining degree of audio identification is risen, so that retrieval obtains more accurate target audio.
Citing, is described in further detail by the method according to described in preceding embodiment below.
For example, referring to Fig. 2 a, in the present embodiment, will be specifically integrated in the speech recognizing device in server cluster into
Row explanation.The server cluster includes feature extraction server, leaf server and root server.It may include one in the system
Or more feature extraction servers, leaf server and root server.The present embodiment includes a feature extraction service with the system
Device, more leaf servers and a root server are illustrated.
(1) client uploads audio to be identified.
User can by the audio of recording or the audio of local, by the audio identification software installed in client or
Music software etc. is uploaded to feature extraction server.
(2) audio-frequency fingerprint is extracted.
Feature extraction server extracts the audio-frequency fingerprint of audio to be identified, as reference finger.Then, feature extraction service
Reference finger is sent respectively to each leaf server by device, to carry out the matching of audio-frequency fingerprint.
(3) fingerprint matching.
Each leaf server extraction unit multi-voice frequency fingerprint from fingerprint base respectively, the matching of Lai Jinhang audio-frequency fingerprint.For example,
Each leaf server can extract corresponding audio-frequency fingerprint from fingerprint base and be matched according to preset allocation rule, thus
It realizes the shunting processing and parallel processing of mass data, improves audio identification speed.
With the illustration of any leaf server.
The leaf server calculates separately the similarity of each audio-frequency fingerprint in reference finger and fingerprint base.For example, leaf server
The quantity for the identical cryptographic Hash that the reference finger and audio-frequency fingerprint each in fingerprint base are included can be counted respectively;By identical Kazakhstan
The quantity of uncommon value, respectively corresponds the similarity as each audio-frequency fingerprint in reference finger and fingerprint base.
Then, which will be greater than the candidate fingerprint of default similarity threshold with the similarity numerical value of reference finger,
It is determined as candidate fingerprint, and candidate fingerprint is sent to root server.
(4) unisonance identifies.
Each candidate fingerprint is configured to candidate and referred to by root server after the candidate fingerprint for obtaining the transmission of each page of server
Line is concentrated, and then, selects reference fingerprint and its unisonance fingerprint in candidate fingerprint concentration.
For example, root server concentrates candidate fingerprint, with the maximum candidate fingerprint of reference finger similarity numerical value as ginseng
Examine fingerprint.
Then, root server calculates reference fingerprint and candidate fingerprint concentrates the registration of other candidate fingerprints.As one kind
Embodiment, the available reference fingerprint of root server and candidate fingerprint concentrate the public sub- sequence of the longest of other candidate fingerprints
Column, count the length of the longest common subsequence;Then, it by the length of longest common subsequence, respectively corresponds as reference
The registration of fingerprint and other candidate fingerprints.
Then, root server selects the same of the reference fingerprint in other described candidate fingerprints according to the registration
Sound fingerprint.As an implementation, root server filters out the weight with the reference fingerprint in other described candidate fingerprints
The right candidate fingerprint more than or equal to preset threshold, the unisonance fingerprint as the reference fingerprint.
Root server realizes the identification of unisonance fingerprint as a result,.
For example, idx is the similarity numerical ranks of candidate fingerprint and reference finger in Fig. 2 b, wherein idx numerical value is 0
Audio-frequency fingerprint and the similarity numerical value of reference finger are maximum;Id is the corresponding audio number of candidate fingerprint, should so as to basis
Id finds its corresponding audio;Score is the similarity numerical value of candidate fingerprint and reference finger, and the numerical value the big, illustrates itself and base
Quasi- fingerprint similarity is higher;Lcs is the longest common subsequence length namely similarity numerical value of candidate fingerprint and reference fingerprint.
By taking Fig. 2 b as an example, taking similarity threshold is 9, then it includes 35 candidates that the candidate fingerprint of root server configuration is concentrated altogether
Fingerprint, that is, the similarity numerical value score of this 35 candidate fingerprints and reference finger is greater than 9.
Wherein, the audio-frequency fingerprint that idx is 0 and the similarity numerical value of reference finger are maximum, as reference fingerprint, because
This, the lcs with itself is 100.Root server calculates separately out candidate fingerprint concentration, the candidate fingerprint and ginseng of idx0 to 34
The lcs length for examining fingerprint, as similarity.If preset threshold is 25, similarity numerical value is 25 or more by root server
Candidate fingerprint is all used as the unisonance fingerprint of reference fingerprint.
(5) audio is screened.
After obtaining reference fingerprint and its unisonance fingerprint, root service is corresponding in the reference fingerprint and its unisonance fingerprint
In audio, the corresponding target audio of the audio to be identified is selected.
For example, root server obtains the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance audio, obtain same
The version information of sound audio;According to the version information, the version priority of the unisonance audio is determined;Most by version priority
High unisonance audio is as the corresponding target audio of the audio to be identified.
It is defeated if root server determines that the corresponding audio of the candidate fingerprint of idx26 is target audio by taking above-mentioned Fig. 2 b as an example
Its audio id out.
(6) result exports.
The target audio that screening obtains is returned to client by root server, for client terminal playing to user.
For example, client obtains the audio id of root server return in Fig. 2 c, it is corresponding that the number is retrieved from audio repository
Target audio, and show user on recognition result display interface.Certainly, which can also be provided on display interface
The title of frequency, singer for example so-and-so, information such as source such as album, and provide broadcast button are played for user.
From the foregoing, it will be observed that user can will need the audio identified to be uploaded to server cluster, server cluster passes through leaf service
Device carries out parallel fingerprint matching, improves audio retrieval speed.Root server carries out into one the matching result of leaf server
The screening of step improves to select content and audio to be identified is closest, and version and the most matched target audio of user demand
Audio identification efficiency and user experience.
In order to better implement above method, the embodiment of the present invention can also provide a kind of speech recognizing device, the audio
Identification device specifically can integrate in the network device, which can be the equipment such as terminal or server.
For example, as shown in figure 3, the speech recognizing device may include fingerprint unit 301, candidate unit 302, unisonance unit
303 and audio unit 304, as follows:
(1) fingerprint unit 301;
Fingerprint unit 301, the audio-frequency fingerprint for extracting audio to be identified calculate the reference finger as reference finger
With the similarity of preset fingerprint library sound intermediate frequency fingerprint.
Wherein, the audio-frequency fingerprint and audio-frequency fingerprint and audio of each audio in audio repository are stored in preset fingerprint base
The mapping relations of each audio in library.For example, speech recognizing device can carry out audio-frequency fingerprint to each audio in audio repository in advance
Extraction, extraction is obtained into the storage of each audio-frequency fingerprint into fingerprint base, and record the mapping relations of each audio and audio-frequency fingerprint.
For example, fingerprint unit 301 obtains audio to be identified, the extraction of audio-frequency fingerprint is carried out, and by the sound of audio to be identified
Frequency fingerprint is as reference finger, for inquiring and its closest or most like audio-frequency fingerprint.
In some embodiments, fingerprint unit 301 can receive audio identification request, obtain audio to be identified;To described
Audio to be identified carries out audio-frequency fingerprint extraction, Hash sequence is obtained, using the Hash sequence as reference finger.
For example, the identification request of client input audio can be used in user, fingerprint unit 301 is receiving audio identification request
Afterwards, notice client starts to carry out audio collection, thus record to the sound etc. in the humming sound or environment of user,
Audio to be identified is obtained, which is that this audio identification requests corresponding audio to be identified.Certainly, user can also
With what client was locally stored, or the audio downloaded from network is uploaded to sound fingerprint unit 301, as a result, fingerprint unit
301 obtain audio identification request and its corresponding audio to be identified.
Wherein, client can be sound pick-up outfit or mobile phone, plate, personal computer with audio collection function etc. eventually
End equipment.
Then, fingerprint unit 301 carries out audio-frequency fingerprint extraction to the audio signal of audio to be identified, obtains audio to be identified
Audio-frequency fingerprint, which contains the audio feature information of audio to be identified.Wherein, to the audio-frequency fingerprint of audio signal
Extraction, which can specifically include, carries out framing, adding window, FFT (Fast Fourier Transform, fast Fourier to audio signal
Transformation) frequency-domain transform, extraction local peaking and conversion Hash sequence etc..
Specifically, fingerprint unit 301 is after obtaining audio to be identified, to the audio signal of audio to be identified carry out framing and
Windowing process.Framing is that whole section audio signal is cut into multistage by preset rules, and each section is a frame, so that audio signal exists
It is smoothly, so as to the Audio Signal Processing smoothing input signal for the later period on microcosmic.Then, fingerprint unit 301 uses pre-
If windowed function respectively to every frame audio carry out adding window, preset windowed function can be Hamming window etc., thus after making framing
Audio signal it is more coherent, show periodic function feature.
Then, fingerprint unit 301 carries out FFT frequency-domain transform to each frame audio signal, obtains the frequency comprising frequency domain information
Spectrum.In turn, fingerprint unit 301 extracts the local peaking in frequency spectrum, and it is as to be identified to be converted into the Hash sequence Hash sequence
The audio-frequency fingerprint of audio.It should be noted that may include multiple cryptographic Hash in the Hash sequence.
Fingerprint unit 301 comes calculating benchmark fingerprint and default finger using the audio-frequency fingerprint of audio to be identified as reference finger
The similarity of line library sound intermediate frequency fingerprint realizes the retrieval or matching of audio-frequency fingerprint.
In some embodiments, the audio-frequency fingerprint in reference finger and fingerprint base is characterized using Hash sequence, fingerprint list
Member 301 can be used for: count the identical cryptographic Hash that the reference finger is included with each audio-frequency fingerprint in preset fingerprint library respectively
Quantity;According to the quantity of the identical cryptographic Hash, the phase of the reference finger with audio-frequency fingerprint each in fingerprint base is calculated separately
Like degree.
By taking audio-frequency fingerprint any in fingerprint base as an example, fingerprint unit 301 by reference finger Hash sequence cryptographic Hash with
Cryptographic Hash in the audio-frequency fingerprint Hash sequence is compared one by one, and counts the quantity quantity of identical cryptographic Hash, fingerprint unit
301 using the quantity of obtained identical cryptographic Hash as the similarity of reference finger and the audio-frequency fingerprint.Fingerprint unit 301 as a result,
It calculates separately to obtain the similarity of each audio-frequency fingerprint in reference finger and fingerprint base.
(2) candidate unit 302;
Candidate unit 302, for the similarity according to the reference finger and fingerprint base sound intermediate frequency fingerprint, in the fingerprint
Candidate fingerprint collection is filtered out in library.
For example, candidate unit 302 can be according to preset similarity threshold, by the similarity in fingerprint base with reference finger
Numerical value be greater than the similarity threshold audio-frequency fingerprint screen, as with the matched candidate fingerprint of reference finger.
It should be noted that with the matched candidate fingerprint of reference finger, it can be understood as its corresponding audio with it is to be identified
Audio is identical or can be considered identical, such as same song or the different same song of music.
In turn, the candidate fingerprint that screening obtains is configured in identity set by candidate unit 302, obtains candidate fingerprint collection.
Candidate fingerprint concentration includes the one or more and matched candidate fingerprint of reference finger as a result,.
(3) unisonance unit 303;
Unisonance unit 303 for selecting reference fingerprint in candidate fingerprint concentration, and obtains the same of the reference fingerprint
Sound fingerprint.
Wherein, reference fingerprint is the candidate fingerprint most like with reference finger.For example, unisonance unit 303 can will be described
Candidate fingerprint is concentrated, and the maximum candidate fingerprint of similarity numerical value with the reference finger is determined as reference fingerprint.
Then, unisonance unit 303 selects the unisonance fingerprint of the reference fingerprint.It should be noted that unisonance fingerprint can be managed
Solution is that its corresponding audio audio corresponding with reference fingerprint is identical or can be considered identical.For example, in the song of music platform
In library, there is number difference but be multiple audios of same song in fact, for example be the different editions of same song, it is different
Singer turns over the different editions sung, or takes in the same song in different albums or radio station, will belong to multiple sounds of same first song
Frequency is defined as unisonance audio, their audio-frequency fingerprint is unisonance fingerprint.
In some embodiments, unisonance unit 303 specifically can be used for: calculate the reference fingerprint and the candidate fingerprint
Concentrate the registration of other candidate fingerprints;According to the registration, the reference fingerprint is selected in other described candidate fingerprints
Unisonance fingerprint.
Wherein, the registration of reference fingerprint and other candidate fingerprints can pass through the side such as correlation, longest common subsequence
Formula is calculated.Wherein, correlation can be the variance for calculating reference fingerprint and other candidate fingerprint Hash sequences, by variance yields
Registration as reference fingerprint and other candidate fingerprints.Then, unisonance unit 303 by variance value meet preset requirement its
His candidate fingerprint, the unisonance fingerprint as reference fingerprint.
It is illustrated, unisonance unit with longest common subsequence (LCS, Longest Common Subsequence)
303 can be used for: obtaining the reference fingerprint and candidate fingerprint concentrates the longest common subsequence of other candidate fingerprints, count institute
State the length of longest common subsequence;According to the length of the longest common subsequence, the reference fingerprint and its is calculated
The registration of his candidate fingerprint.
Wherein, reference fingerprint and candidate fingerprint concentrate other candidate fingerprints to characterize using Hash sequence.
As a particular sequence, subsequence refers under conditions of not changing element relative rank Hash sequence, will
The sequence that zero or more element removes in sequence.If a sequence while the subsequence as multiple Hash sequences,
The sequence is the common subsequence of this multiple Hash sequence.And the longest common subsequence of Hash sequence, it is multiple Hash
The longest shared subsequence of sequence.The length of longest common subsequence is the quantity of element in common subsequence.
For example, Dynamic Programming (DP, Dynamic Programming) can be used to calculate reference fingerprint and other candidate fingerprints
The longest common subsequence length of Hash sequence.In the present embodiment, reference fingerprint and other candidate fingerprint Hash sequences are most
Long common subsequence calculating formula of length is as follows:
Nlcs=LCS (res [i] .hash_seq, res [0] .hash_seq)
Wherein, nlcs is longest common subsequence length, and LCS is Dynamic Programming longest common subsequence length computation letter
Number, res [i] hash_seq are i-th of candidate fingerprint Hash sequence, and res [0] .hash_seq is reference fingerprint Hash sequence.
By taking other any candidate fingerprints as an example, after obtaining reference fingerprint and its longest common subsequence length, unisonance
The registration of unit 303 calculating reference fingerprint and other candidate fingerprints.For example, following formula can be used to calculate:
Sim=nlcs/hash_seq_cnt × 100%;
Wherein, sim is the similarity of reference fingerprint and other candidate fingerprints, and nlcs is longest common subsequence length,
Hash_seq_cnt is reference fingerprint Hash sequence length.In some embodiments, the code of the formula can refer to int sim=
nlcs*1.0/hash_seq_cnt*100。
Unisonance unit 303 can calculate separately to obtain the registration of reference fingerprint and other each candidate fingerprints as a result,.
Then, unisonance unit 303 in other candidate fingerprints, can select the unisonance fingerprint of reference fingerprint.
For example, unisonance unit 303 can unisonance by other maximum candidate fingerprints of registration numerical value, as reference fingerprint
Fingerprint;Alternatively, speech recognizing device by registration numerical value according to sequence from large to small, choose sequence preceding default precedence its
His candidate fingerprint, the unisonance fingerprint as reference fingerprint.
In some embodiments, unisonance unit 303 can be used for: in other described candidate fingerprints, filter out with it is described
The registration of reference fingerprint is greater than or equal to the candidate fingerprint of preset threshold, the unisonance fingerprint as the reference fingerprint.
Wherein, preset threshold can be adjusted flexibly according to actual needs, such as 25%.
In other candidate fingerprints of candidate fingerprint collection, the unisonance that screening obtains reference fingerprint refers to unisonance unit 303 as a result,
Line.
Unisonance unit 303 eliminates by the calculating of similarity and does unisonance audio indicia to the audio in audio repository as a result,
Manpower and time cost, the case where also avoiding manual entry information not in time, and in audio storage, no longer need to do same
The artificial additional markers of sound audio or classification also eliminate the need for the risk of information mistakes and omissions record, reduce maintenance cost.Therefore,
Embodiment improves the accuracy and efficiencies of unisonance fingerprint and unisonance audio identification.
In some embodiments, if not finding the candidate for being greater than or equal to preset threshold with the registration of the reference fingerprint
Fingerprint, then the corresponding audio of the reference fingerprint is determined as the corresponding target audio of the audio to be identified by audio unit 304.
As a result, in the unisonance fingerprint that can not find reference fingerprint, unisonance unit 303 determine candidate fingerprint concentrate not with
Other very approximate candidate fingerprints of reference fingerprint.Therefore, audio unit 304 is according to audio-frequency fingerprint each in fingerprint base and audio
Mapping relations determine the corresponding audio of the reference fingerprint, and the audio are determined as the corresponding target audio of audio to be identified.
(4) audio unit 304;
Audio unit 304, for selecting described to be identified in the reference fingerprint and its corresponding audio of unisonance fingerprint
The corresponding target audio of audio.
After obtaining reference fingerprint and its unisonance fingerprint, audio unit 304 is according to audio-frequency fingerprint each in fingerprint base and audio
Mapping relations, determine reference fingerprint and its corresponding audio of unisonance fingerprint.
Then, audio unit 304 selects target audio in reference fingerprint and its corresponding audio of unisonance fingerprint.For example,
Audio unit 304 is by reference fingerprint and its corresponding audio of unisonance fingerprint, all as the corresponding target audio of audio to be identified.
In some embodiments, reference fingerprint and its corresponding audio of unisonance fingerprint can also be carried out according to actual needs
Screening, audio unit 304 specifically can be used for: obtaining the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance sound
Frequently, the version information of unisonance audio is obtained;According to the version information, the version priority of the unisonance audio is determined;By version
The unisonance audio of this highest priority is as the corresponding target audio of the audio to be identified.
Wherein, version information includes the information such as source, singer, restocking and/or the issuing date of audio, can be audio certainly
The presupposed information of band.Unisonance audio can be the different audios of version informations such as source difference and/or version.
For example, audio unit 304 is according to the source-information in unisonance audio, it is the version priority level initializing of album by source
For highest, source is that the version priority level initializing in radio station is minimum.Source is the unisonance sound of album by audio unit 304 as a result,
Frequency is determined as target audio.
For example, shelf life of the audio unit 304 according to unisonance audio, according to chronological order, most by shelf life
Early version priority is set as highest, and the version priority of shelf life the latest is set as minimum.Audio unit 304 as a result,
The earliest unisonance audio of shelf life is determined as target audio.
Target audio is most like with audio to be identified as a result, and version most accurate audio.
From the foregoing, it will be observed that fingerprint of embodiment of the present invention unit 301 can extract the audio-frequency fingerprint of audio to be identified as benchmark
Fingerprint calculates the similarity of the reference finger Yu preset fingerprint library sound intermediate frequency fingerprint;Candidate unit 302 refers to according to the benchmark
The similarity of line and fingerprint base sound intermediate frequency fingerprint, filters out candidate fingerprint collection in the fingerprint base, and candidate fingerprint concentration includes
With the approximate audio-frequency fingerprint of reference finger;Then, unisonance unit 303 selects reference fingerprint in candidate fingerprint concentration, and obtains
Take the unisonance fingerprint of the reference fingerprint;In the reference fingerprint and its corresponding audio of unisonance fingerprint, audio unit 304 is selected
The corresponding target audio of the audio to be identified out.As a result, the program retrieve with after the approximate candidate fingerprint of reference finger,
Although candidate fingerprint be it is matched with reference finger, it may cause to exist because of the version problem etc. of audio to be identified
It is uncertain.Therefore, the program further selects reference fingerprint in candidate fingerprint concentration, and then is being waited by the calculating of registration
It selects in other candidate fingerprints of fingerprint collection and selects unisonance fingerprint, realize the further screening to candidate fingerprint.The program is passed through
Obtained reference fingerprint and its unisonance fingerprint are repeatedly screened, includes closest with the reference finger of audio to be identified, and is corresponding
Audio is identical or can be considered identical audio-frequency fingerprint.To be selected in reference fingerprint and its corresponding audio of unisonance fingerprint
Target audio is the audio of optimal version, can be used as the real source or source of audio to be identified, while having ensured target audio
The accuracy of content and version improves the whole efficiency and user experience of audio identification.The program passes through in fingerprint base
Audio-frequency fingerprint is screened layer by layer, has refined audio identification granularity, the fining degree of audio identification is improved, to retrieve
To more accurate target audio.
The embodiment of the present invention also provides a kind of audio recognition devices, and as shown in fig. 4 a, it illustrates institutes of the embodiment of the present invention
The structural schematic diagram for the audio recognition devices being related to, specifically:
The audio recognition devices may include one or processor 401, one or one of more than one processing core
The components such as memory 402, power supply 403 and the input unit 404 of the above computer readable storage medium.Those skilled in the art can
To understand, audio recognition devices structure shown in Fig. 4 a does not constitute the restriction to audio recognition devices, may include than figure
Show more or fewer components, perhaps combines certain components or different component layouts.Wherein:
Processor 401 is the control centre of the audio recognition devices, is known using various interfaces and the entire audio of connection
The various pieces of other equipment, by running or executing the software program and/or module that are stored in memory 402, and calling
The data being stored in memory 402 execute the various functions and processing data of audio recognition devices, to set to audio identification
It is standby to carry out integral monitoring.Optionally, processor 401 may include one or more processing cores;Preferably, processor 401 can collect
At application processor and modem processor, wherein the main processing operation system of application processor, user interface and apply journey
Sequence etc., modem processor mainly handle wireless communication.It is understood that above-mentioned modem processor can not also collect
At into processor 401.
Memory 402 can be used for storing software program and module, and processor 401 is stored in memory 402 by operation
Software program and module, thereby executing various function application and data processing.Memory 402 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as audio identification function) etc.;Storage data area, which can be stored, uses created data etc. according to audio recognition devices.This
Outside, memory 402 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one
Disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also include storage
Device controller, to provide access of the processor 401 to memory 402.
Audio recognition devices further include the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power supply
Management system and processor 401 are logically contiguous, to realize management charging, electric discharge and power consumption pipe by power-supply management system
The functions such as reason.Power supply 403 can also include one or more direct current or AC power source, recharging system, power failure
The random components such as detection circuit, power adapter or inverter, power supply status indicator.
The audio recognition devices may also include input unit 404, the input unit 404 can be used for receiving input number or
Character information, and generate keyboard related with user setting and function control, mouse, operating stick, optics or trace ball
Signal input.
In addition, audio recognition devices can also include audio collecting device 405 referring to Fig. 4 b, audio collecting device 405 is used
In acquisition audio to be identified.For example, audio collecting device 405 can acquire audio to be identified by modes such as recording.
Although being not shown, audio recognition devices can also be including display unit etc., and details are not described herein.Specifically in this implementation
In example, processor 401 in audio recognition devices can according to following instruction, by one or more application program into
The corresponding executable file of journey is loaded into memory 402, and is run by processor 401 and be stored in answering in memory 402
With program, thus realize various functions, it is as follows:
The audio-frequency fingerprint of audio to be identified is extracted as reference finger, calculates the reference finger and preset fingerprint library middle pitch
The similarity of frequency fingerprint;According to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint, screened in the fingerprint base
Candidate fingerprint collection out;Reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;Institute
It states in reference fingerprint and its corresponding audio of unisonance fingerprint, selects the corresponding target audio of the audio to be identified.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
It calculates the reference fingerprint and candidate fingerprint concentrates the registration of other candidate fingerprints;According to the registration,
The unisonance fingerprint of the reference fingerprint is selected in other described candidate fingerprints.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
It obtains the reference fingerprint and candidate fingerprint concentrates the longest common subsequence of other candidate fingerprints, statistics is described most
The length of long common subsequence;According to the length of the longest common subsequence, the reference fingerprint and other times is calculated
Select the registration of fingerprint.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
In other described candidate fingerprints, filters out and be greater than or equal to preset threshold with the registration of the reference fingerprint
Candidate fingerprint, the unisonance fingerprint as the reference fingerprint.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
If the candidate fingerprint for being greater than or equal to preset threshold with the registration of the reference fingerprint is not found, by the ginseng
It examines the corresponding audio of fingerprint and is determined as the corresponding target audio of the audio to be identified.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
The candidate fingerprint is concentrated, the maximum candidate fingerprint of similarity numerical value with the reference finger is determined as joining
Examine fingerprint.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
The quantity for the identical cryptographic Hash that the reference finger and each audio-frequency fingerprint in preset fingerprint library are included is counted respectively;
According to the quantity of the identical cryptographic Hash, the similarity of each audio-frequency fingerprint in the reference finger and fingerprint base is calculated separately.
Processor 401 can also run the application program being stored in memory 402, implement function such as:
It obtains the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance audio, obtain the version letter of unisonance audio
Breath;According to the version information, the version priority of the unisonance audio is determined;The unisonance audio of version highest priority is made
For the corresponding target audio of the audio to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with
It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one
In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present invention provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed
Device is loaded, to execute the step in any audio identification methods provided by the embodiment of the present invention.For example, the instruction can
To execute following steps:
The audio-frequency fingerprint of audio to be identified is extracted as reference finger, calculates the reference finger and preset fingerprint library middle pitch
The similarity of frequency fingerprint;According to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint, screened in the fingerprint base
Candidate fingerprint collection out;Reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;Institute
It states in reference fingerprint and its corresponding audio of unisonance fingerprint, selects the corresponding target audio of the audio to be identified.
Following steps can also be performed in the instruction:
It calculates the reference fingerprint and candidate fingerprint concentrates the registration of other candidate fingerprints;According to the registration,
The unisonance fingerprint of the reference fingerprint is selected in other described candidate fingerprints.
Following steps can also be performed in the instruction:
It obtains the reference fingerprint and candidate fingerprint concentrates the longest common subsequence of other candidate fingerprints, statistics is described most
The length of long common subsequence;According to the length of the longest common subsequence, the reference fingerprint and other times is calculated
Select the registration of fingerprint.
Following steps can also be performed in the instruction:
In other described candidate fingerprints, filters out and be greater than or equal to preset threshold with the registration of the reference fingerprint
Candidate fingerprint, the unisonance fingerprint as the reference fingerprint.
Following steps can also be performed in the instruction:
If the candidate fingerprint for being greater than or equal to preset threshold with the registration of the reference fingerprint is not found, by the ginseng
It examines the corresponding audio of fingerprint and is determined as the corresponding target audio of the audio to be identified.
Following steps can also be performed in the instruction:
The candidate fingerprint is concentrated, the maximum candidate fingerprint of similarity numerical value with the reference finger is determined as joining
Examine fingerprint.
Following steps can also be performed in the instruction:
The quantity for the identical cryptographic Hash that the reference finger and each audio-frequency fingerprint in preset fingerprint library are included is counted respectively;
According to the quantity of the identical cryptographic Hash, the similarity of each audio-frequency fingerprint in the reference finger and fingerprint base is calculated separately.
Following steps can also be performed in the instruction:
It obtains the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance audio, obtain the version letter of unisonance audio
Breath;According to the version information, the version priority of the unisonance audio is determined;The unisonance audio of version highest priority is made
For the corresponding target audio of the audio to be identified.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory
Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, any audio provided by the embodiment of the present invention can be executed and known
Step in other method, it is thereby achieved that achieved by any audio identification methods provided by the embodiment of the present invention
Beneficial effect is detailed in the embodiment of front, and details are not described herein.
A kind of audio identification methods, device, equipment and storage medium is provided for the embodiments of the invention above to carry out
It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments
Illustrate to be merely used to help understand method and its core concept of the invention;Meanwhile for those skilled in the art, according to this
The thought of invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered
It is interpreted as limitation of the present invention.
Claims (12)
1. a kind of audio identification methods characterized by comprising
The audio-frequency fingerprint of audio to be identified is extracted as reference finger, the reference finger is calculated and refers to preset fingerprint library sound intermediate frequency
The similarity of line;
According to the similarity of the reference finger and fingerprint base sound intermediate frequency fingerprint, candidate fingerprint is filtered out in the fingerprint base
Collection;
Reference fingerprint is selected in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;
In the reference fingerprint and its corresponding audio of unisonance fingerprint, the corresponding target audio of the audio to be identified is selected.
2. the method according to claim 1, wherein the unisonance fingerprint for obtaining the reference fingerprint, comprising:
It calculates the reference fingerprint and candidate fingerprint concentrates the registration of other candidate fingerprints;
According to the registration, the unisonance fingerprint of the reference fingerprint is selected in other described candidate fingerprints.
3. according to the method described in claim 2, it is characterized in that, the calculating reference fingerprint and candidate fingerprint concentrate it
The registration of his candidate fingerprint, comprising:
It obtains the reference fingerprint and candidate fingerprint concentrates the longest common subsequence of other candidate fingerprints, it is public to count the longest
The length of subsequence altogether;
According to the length of the longest common subsequence, the registration of the reference fingerprint Yu other candidate fingerprints is calculated.
4. according to the method described in claim 2, referring in other described candidates it is characterized in that, described according to the registration
The unisonance fingerprint of the reference fingerprint is selected in line, comprising:
In other described candidate fingerprints, the candidate for being greater than or equal to preset threshold with the registration of the reference fingerprint is filtered out
Fingerprint, the unisonance fingerprint as the reference fingerprint.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
If not finding the candidate fingerprint for being greater than or equal to preset threshold with the registration of the reference fingerprint, the reference is referred to
The corresponding audio of line is determined as the corresponding target audio of the audio to be identified.
6. the method according to claim 1, wherein selecting reference fingerprint in candidate fingerprint concentration, comprising:
The candidate fingerprint is concentrated, the maximum candidate fingerprint of similarity numerical value with the reference finger, is determined as with reference to referring to
Line.
7. the method according to claim 1, wherein described calculate the reference finger and preset fingerprint library middle pitch
The similarity of frequency fingerprint, comprising:
The quantity for the identical cryptographic Hash that the reference finger and each audio-frequency fingerprint in preset fingerprint library are included is counted respectively;
According to the quantity of the identical cryptographic Hash, it is similar to audio-frequency fingerprint each in fingerprint base to calculate separately the reference finger
Degree.
8. method according to claim 1-7, which is characterized in that described to refer in the reference fingerprint and its unisonance
In the corresponding audio of line, the corresponding target audio of the audio to be identified is selected, comprising:
It obtains the reference fingerprint and its corresponding audio of unisonance fingerprint is unisonance audio, obtain the version information of unisonance audio;
According to the version information, the version priority of the unisonance audio is determined;
Using the unisonance audio of version highest priority as the corresponding target audio of the audio to be identified.
9. a kind of speech recognizing device characterized by comprising
Fingerprint unit, the audio-frequency fingerprint for extracting audio to be identified calculate the reference finger and preset as reference finger
The similarity of fingerprint base sound intermediate frequency fingerprint;
Candidate unit sieves in the fingerprint base for the similarity according to the reference finger and fingerprint base sound intermediate frequency fingerprint
Select candidate fingerprint collection;
Unisonance unit for selecting reference fingerprint in candidate fingerprint concentration, and obtains the unisonance fingerprint of the reference fingerprint;
Audio unit, for selecting the audio pair to be identified in the reference fingerprint and its corresponding audio of unisonance fingerprint
The target audio answered.
10. a kind of audio recognition devices, which is characterized in that the audio recognition devices include: memory, processor and are stored in
On the memory, and the audio identification program that can be run on the processor, the audio identification program is by the processing
The step of device realizes the method according to claim 1 when executing.
11. a kind of audio recognition devices, which is characterized in that the audio recognition devices further include audio collecting device, the sound
Frequency acquisition device is for acquiring audio to be identified.
12. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor
It is loaded, the step in 1 to 8 described in any item audio identification methods is required with perform claim.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910270746.7A CN110047515B (en) | 2019-04-04 | 2019-04-04 | Audio identification method, device, equipment and storage medium |
PCT/CN2019/093398 WO2020199384A1 (en) | 2019-04-04 | 2019-06-27 | Audio recognition method, apparatus and device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910270746.7A CN110047515B (en) | 2019-04-04 | 2019-04-04 | Audio identification method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110047515A true CN110047515A (en) | 2019-07-23 |
CN110047515B CN110047515B (en) | 2021-04-20 |
Family
ID=67276270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910270746.7A Active CN110047515B (en) | 2019-04-04 | 2019-04-04 | Audio identification method, device, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110047515B (en) |
WO (1) | WO2020199384A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN111274449A (en) * | 2020-02-18 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Video playing method and device, electronic equipment and storage medium |
CN111382562A (en) * | 2020-03-05 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111489757A (en) * | 2020-03-26 | 2020-08-04 | 北京达佳互联信息技术有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN111597379A (en) * | 2020-07-22 | 2020-08-28 | 深圳市声扬科技有限公司 | Audio searching method and device, computer equipment and computer-readable storage medium |
CN111681671A (en) * | 2020-05-20 | 2020-09-18 | 浙江大华技术股份有限公司 | Abnormal sound identification method and device and computer storage medium |
CN112435688A (en) * | 2020-11-20 | 2021-03-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio recognition method, server and storage medium |
WO2022194277A1 (en) * | 2021-03-18 | 2022-09-22 | 百果园技术(新加坡)有限公司 | Audio fingerprint processing method and apparatus, and computer device and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440313A (en) * | 2013-08-27 | 2013-12-11 | 复旦大学 | Music retrieval system based on audio fingerprint features |
CN104317967A (en) * | 2014-11-17 | 2015-01-28 | 北京航空航天大学 | Two-layer advertisement audio retrieval method based on audio fingerprints |
CN104464726A (en) * | 2014-12-30 | 2015-03-25 | 北京奇艺世纪科技有限公司 | Method and device for determining similar audios |
CN104715033A (en) * | 2015-03-16 | 2015-06-17 | 太原理工大学 | Step type voice frequency retrieval method |
CN107293307A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | Audio-frequency detection and device |
US20170337939A1 (en) * | 2013-03-15 | 2017-11-23 | Facebook, Inc. | Generating audio fingerprints based on audio signal complexity |
CN107533850A (en) * | 2015-04-27 | 2018-01-02 | 三星电子株式会社 | Audio content recognition methods and device |
CN107577773A (en) * | 2017-09-08 | 2018-01-12 | 科大讯飞股份有限公司 | Audio matching method and device and electronic equipment |
US20180047416A1 (en) * | 2014-11-25 | 2018-02-15 | Facebook, Inc. | Indexing based on time-variant transforms of an audio signal's spectrogram |
WO2018033696A1 (en) * | 2016-08-15 | 2018-02-22 | Intrasonics S.A.R.L | Audio matching |
CN107967922A (en) * | 2017-12-19 | 2018-04-27 | 成都嗨翻屋文化传播有限公司 | A kind of music copyright recognition methods of feature based |
US20180144040A1 (en) * | 2012-03-28 | 2018-05-24 | Interactive Intelligence Group, Inc. | System and method for fingerprinting datasets |
CN108280074A (en) * | 2017-01-05 | 2018-07-13 | 北京酷我科技有限公司 | The recognition methods of audio and system |
CN108509558A (en) * | 2018-03-23 | 2018-09-07 | 太原理工大学 | A kind of sample count audio search method that resistance rapid-curing cutback is disturbed |
-
2019
- 2019-04-04 CN CN201910270746.7A patent/CN110047515B/en active Active
- 2019-06-27 WO PCT/CN2019/093398 patent/WO2020199384A1/en active Application Filing
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180144040A1 (en) * | 2012-03-28 | 2018-05-24 | Interactive Intelligence Group, Inc. | System and method for fingerprinting datasets |
US20170337939A1 (en) * | 2013-03-15 | 2017-11-23 | Facebook, Inc. | Generating audio fingerprints based on audio signal complexity |
CN103440313A (en) * | 2013-08-27 | 2013-12-11 | 复旦大学 | Music retrieval system based on audio fingerprint features |
CN104317967A (en) * | 2014-11-17 | 2015-01-28 | 北京航空航天大学 | Two-layer advertisement audio retrieval method based on audio fingerprints |
US20180047416A1 (en) * | 2014-11-25 | 2018-02-15 | Facebook, Inc. | Indexing based on time-variant transforms of an audio signal's spectrogram |
CN104464726A (en) * | 2014-12-30 | 2015-03-25 | 北京奇艺世纪科技有限公司 | Method and device for determining similar audios |
CN104715033A (en) * | 2015-03-16 | 2015-06-17 | 太原理工大学 | Step type voice frequency retrieval method |
CN107533850A (en) * | 2015-04-27 | 2018-01-02 | 三星电子株式会社 | Audio content recognition methods and device |
CN107293307A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | Audio-frequency detection and device |
WO2018033696A1 (en) * | 2016-08-15 | 2018-02-22 | Intrasonics S.A.R.L | Audio matching |
CN108280074A (en) * | 2017-01-05 | 2018-07-13 | 北京酷我科技有限公司 | The recognition methods of audio and system |
CN107577773A (en) * | 2017-09-08 | 2018-01-12 | 科大讯飞股份有限公司 | Audio matching method and device and electronic equipment |
CN107967922A (en) * | 2017-12-19 | 2018-04-27 | 成都嗨翻屋文化传播有限公司 | A kind of music copyright recognition methods of feature based |
CN108509558A (en) * | 2018-03-23 | 2018-09-07 | 太原理工大学 | A kind of sample count audio search method that resistance rapid-curing cutback is disturbed |
Non-Patent Citations (3)
Title |
---|
H. NAGANO 等: ""A fast audio search method based on skipping irrelevant signals by similarity upper-bound calculation"", 《2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
况鑫楠: ""基于大数据平台的音频指纹检索系统的研究与实现"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
王运生: ""基于内容的海量音频高效检索"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517671A (en) * | 2019-08-30 | 2019-11-29 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of appraisal procedure of audio-frequency information, device and storage medium |
CN110517671B (en) * | 2019-08-30 | 2022-04-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information evaluation method and device and storage medium |
CN111274449A (en) * | 2020-02-18 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Video playing method and device, electronic equipment and storage medium |
CN111274449B (en) * | 2020-02-18 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Video playing method, device, electronic equipment and storage medium |
CN111382562A (en) * | 2020-03-05 | 2020-07-07 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111382562B (en) * | 2020-03-05 | 2024-03-01 | 百度在线网络技术(北京)有限公司 | Text similarity determination method and device, electronic equipment and storage medium |
CN111489757A (en) * | 2020-03-26 | 2020-08-04 | 北京达佳互联信息技术有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN111489757B (en) * | 2020-03-26 | 2023-08-18 | 北京达佳互联信息技术有限公司 | Audio processing method, device, electronic equipment and readable storage medium |
CN111681671B (en) * | 2020-05-20 | 2023-03-10 | 浙江大华技术股份有限公司 | Abnormal sound identification method and device and computer storage medium |
CN111681671A (en) * | 2020-05-20 | 2020-09-18 | 浙江大华技术股份有限公司 | Abnormal sound identification method and device and computer storage medium |
CN111597379B (en) * | 2020-07-22 | 2020-11-03 | 深圳市声扬科技有限公司 | Audio searching method and device, computer equipment and computer-readable storage medium |
CN111597379A (en) * | 2020-07-22 | 2020-08-28 | 深圳市声扬科技有限公司 | Audio searching method and device, computer equipment and computer-readable storage medium |
CN112435688A (en) * | 2020-11-20 | 2021-03-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio recognition method, server and storage medium |
CN112435688B (en) * | 2020-11-20 | 2024-06-18 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, server and storage medium |
WO2022194277A1 (en) * | 2021-03-18 | 2022-09-22 | 百果园技术(新加坡)有限公司 | Audio fingerprint processing method and apparatus, and computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020199384A1 (en) | 2020-10-08 |
CN110047515B (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110047515A (en) | A kind of audio identification methods, device, equipment and storage medium | |
CN105074697B (en) | For inferring the accumulation of the real-time crowdsourcing data of the metadata about entity | |
CN105120304B (en) | Information display method, apparatus and system | |
KR101578279B1 (en) | Methods and systems for identifying content in a data stream | |
CN102654859B (en) | Method and system for recommending songs | |
US20170140260A1 (en) | Content filtering with convolutional neural networks | |
US10467287B2 (en) | Systems and methods for automatically suggesting media accompaniments based on identified media content | |
CN106469557B (en) | Method and device for providing accompaniment music | |
EP3241124A1 (en) | Systems and methods for creation of a listening log and music library | |
TW200300925A (en) | System and method for music identification | |
WO2012170451A1 (en) | Methods and systems for performing comparisons of received data and providing a follow-on service based on the comparisons | |
US9524715B2 (en) | System and method for content recognition in portable devices | |
CN105335414A (en) | Music recommendation method, device and terminal | |
CN104091596A (en) | Music identifying method, system and device | |
EP3985669A1 (en) | Methods and systems for automatically matching audio content with visual input | |
CN109237740A (en) | Control method and device of electric appliance, storage medium and electric appliance | |
AU2020269924A1 (en) | Methods and systems for determining compact semantic representations of digital audio signals | |
CN110010159A (en) | Sound similarity determines method and device | |
US11785276B2 (en) | Event source content and remote content synchronization | |
CN109271501A (en) | A kind of management method and system of audio database | |
CN109756628A (en) | Method and device for playing function key sound effect and electronic equipment | |
CN106775567B (en) | Sound effect matching method and system | |
US11410706B2 (en) | Content pushing method for display device, pushing device and display device | |
CN113032616A (en) | Audio recommendation method and device, computer equipment and storage medium | |
CN111489757A (en) | Audio processing method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |