CN111597379A - Audio searching method and device, computer equipment and computer-readable storage medium - Google Patents

Audio searching method and device, computer equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111597379A
CN111597379A CN202010707678.9A CN202010707678A CN111597379A CN 111597379 A CN111597379 A CN 111597379A CN 202010707678 A CN202010707678 A CN 202010707678A CN 111597379 A CN111597379 A CN 111597379A
Authority
CN
China
Prior art keywords
audio
hash
pair
primary
fingerprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010707678.9A
Other languages
Chinese (zh)
Other versions
CN111597379B (en
Inventor
黄润乾
张伟彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Voiceai Technologies Co ltd
Original Assignee
Voiceai Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voiceai Technologies Co ltd filed Critical Voiceai Technologies Co ltd
Priority to CN202010707678.9A priority Critical patent/CN111597379B/en
Publication of CN111597379A publication Critical patent/CN111597379A/en
Application granted granted Critical
Publication of CN111597379B publication Critical patent/CN111597379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an audio searching method, an audio searching device, computer equipment and a storage medium. The method comprises the following steps: acquiring a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is grouped according to a preset window length to obtain a hash pair group; the hash pair packet comprises at least two hash pairs; calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of a hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprint comprises each of the hash pair packets and each of the corresponding secondary hash keys; and inquiring the secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library, and outputting an audio file corresponding to the matched secondary audio fingerprint. By adopting the method, the efficiency of audio search can be improved.

Description

Audio searching method and device, computer equipment and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio search method, an audio search apparatus, a computer device, and a computer-readable storage medium.
Background
At present, an audio fingerprint technology is generally adopted in audio search, and the audio fingerprint technology refers to that after an audio fingerprint is extracted from an audio clip, the audio fingerprint is compared with audio fingerprints in an audio library, so that audio information of the audio clip is determined.
However, when the audio fingerprints are compared in the conventional audio fingerprint technology, the candidate audio fingerprints need to be screened first, and then the audio fingerprints of the audio clip are accurately matched with the candidate audio fingerprints, so that the data volume of the screened candidate audio fingerprints is large, and the efficiency of audio search by using the audio fingerprint technology is low.
Disclosure of Invention
Based on this, it is necessary to provide an audio search method, apparatus, computer device and computer-readable storage medium for solving the technical problem of low efficiency of audio search by using audio fingerprinting technology.
An audio search method, the method comprising:
acquiring a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment;
grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs;
calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of a hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprint comprises each of the hash pair packets and each of the corresponding secondary hash keys;
and searching a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library, and outputting an audio file corresponding to the matched secondary audio fingerprint.
In one embodiment, the obtaining a primary audio fingerprint of an audio piece includes:
extracting the spectral feature of the audio clip, and determining the spectral peak point of the spectral feature;
constructing a hash pair according to the frequency spectrum peak point;
and calculating a primary hash key corresponding to each hash pair according to the first frequency and the first time of the first spectrum peak point corresponding to the hash pair and the second frequency and the second time of the second spectrum peak point corresponding to the hash pair, and obtaining a primary audio fingerprint of the audio segment comprising each hash pair and the corresponding primary hash key.
In one embodiment, the calculating, according to the first-level hash key of the hash pair in the hash pair packet, a second-level hash key corresponding to each hash pair packet includes:
acquiring a first-level hash key of each hash pair in the hash pair group;
substituting the obtained primary hash key into a hash formula to calculate to obtain a secondary hash key corresponding to each hash pair group; the hash formula is:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 891081DEST_PATH_IMAGE002
is as follows
Figure DEST_PATH_IMAGE003
Each hash pair is associated with a second-level hash key,
Figure 308549DEST_PATH_IMAGE004
is as follows
Figure 842430DEST_PATH_IMAGE003
The number of hash pairs that a packet contains,
Figure DEST_PATH_IMAGE005
is as follows
Figure 617619DEST_PATH_IMAGE003
The first of the hash pair packets
Figure 59096DEST_PATH_IMAGE006
One of the hash pairThe level one of the hash keys is a level one of hash keys,
Figure DEST_PATH_IMAGE007
is as follows
Figure 612961DEST_PATH_IMAGE006
And the hash pairs correspond to the distinguishing factors.
In one embodiment, after the outputting of the audio file corresponding to the matched secondary audio fingerprint, the method comprises:
inquiring the primary audio fingerprint of the audio file in a primary audio fingerprint library;
and inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints.
In an embodiment, after the calculating a secondary hash key corresponding to each hash pair packet according to a primary hash key of a hash pair in the hash pair packet, the method further includes:
regrouping the Hash pair packets to obtain multi-level Hash pair packets; the level of the multi-level hash pair packet is at least three;
calculating the multistage hash keys corresponding to the multistage hash groups according to the two-stage hash keys of the hash pair groups contained in the multistage hash pair groups to obtain the multistage audio fingerprints of the audio segments; the multi-level audio fingerprint comprises each multi-level hash packet and each corresponding multi-level hash key;
and inquiring the multistage audio fingerprints matched with the multistage audio fingerprints of the audio clips in a multistage audio fingerprint library, and outputting audio files corresponding to the matched multistage audio fingerprints.
In one embodiment, before searching for a secondary audio fingerprint in the secondary audio fingerprint repository that matches the secondary audio fingerprint of the audio piece, the method further comprises:
acquiring an audio signal of an audio file;
establishing a primary audio fingerprint database containing the audio file according to the frequency spectrum peak point of the audio signal; the primary audio fingerprint database comprises hash pairs of the audio files and corresponding primary hash keys;
and establishing a secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
In one embodiment, the creating a secondary audio fingerprint library containing the audio files according to the hash pair of each audio file and the corresponding primary hash key includes:
grouping the hash pairs of the audio files to obtain hash pair groups of the audio files;
and calculating a secondary hash key corresponding to the hash pair grouping of each audio file according to the primary hash key of the hash pair in the hash pair grouping of each audio file to obtain a secondary audio fingerprint library containing the hash pair grouping of each audio file and the corresponding secondary hash key.
An audio search apparatus, the apparatus comprising:
the primary audio fingerprint acquisition module is used for acquiring a primary audio fingerprint of the audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment;
the grouping module is used for grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs;
the second-level audio fingerprint acquisition module is used for calculating a second-level hash key corresponding to each hash pair group according to a first-level hash key of the hash pair in the hash pair group to acquire a second-level audio fingerprint of the audio segment; the secondary audio fingerprint comprises each of the hash pair packets and each of the corresponding secondary hash keys;
and the audio query module is used for querying a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library and outputting an audio file corresponding to the matched secondary audio fingerprint.
In one embodiment, the primary audio fingerprint acquisition module is further configured to:
extracting the spectral feature of the audio clip, and determining the spectral peak point of the spectral feature;
constructing a hash pair according to the frequency spectrum peak point;
and calculating a primary hash key corresponding to each hash pair according to the first frequency and the first time of the first spectrum peak point corresponding to the hash pair and the second frequency and the second time of the second spectrum peak point corresponding to the hash pair, and obtaining a primary audio fingerprint of the audio segment comprising each hash pair and the corresponding primary hash key.
In one embodiment, the secondary audio fingerprint acquisition module is further configured to:
acquiring a first-level hash key of each hash pair in the hash pair group;
substituting the obtained primary hash key into a hash formula to calculate to obtain a secondary hash key corresponding to each hash pair group; the hash formula is:
Figure 317743DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 767179DEST_PATH_IMAGE002
is as follows
Figure 684451DEST_PATH_IMAGE003
Each hash pair is associated with a second-level hash key,
Figure 512730DEST_PATH_IMAGE004
is as follows
Figure 991074DEST_PATH_IMAGE003
The number of hash pairs that a packet contains,
Figure 396648DEST_PATH_IMAGE005
is as follows
Figure 665080DEST_PATH_IMAGE003
The first of the hash pair packets
Figure 19969DEST_PATH_IMAGE006
The first-order hash key of each hash pair,
Figure 269816DEST_PATH_IMAGE007
is as follows
Figure 585522DEST_PATH_IMAGE006
And the hash pairs correspond to the distinguishing factors.
In one embodiment, the audio query module is further configured to:
inquiring the primary audio fingerprint of the audio file in a primary audio fingerprint library;
and inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints.
In one embodiment, the grouping module is further configured to:
regrouping the Hash pair packets to obtain multi-level Hash pair packets; the level of the multi-level hash pair packet is at least three;
the second-level audio fingerprint acquisition module is further configured to calculate a multi-level hash key corresponding to each multi-level hash packet according to a second-level hash key of each hash pair packet included in the multi-level hash pair packet, and acquire a multi-level audio fingerprint of the audio segment; the multi-level audio fingerprint comprises each multi-level hash packet and each corresponding multi-level hash key;
the audio query module is further configured to query, in a multi-level audio fingerprint library, multi-level audio fingerprints matched with the multi-level audio fingerprints of the audio clips, and output audio files corresponding to the matched multi-level audio fingerprints.
In one embodiment, the apparatus further comprises:
the audio file acquisition module is used for acquiring an audio signal of an audio file;
the primary audio fingerprint database establishing module is used for establishing a primary audio fingerprint database containing the audio file according to the frequency spectrum peak point of the audio signal; the primary audio fingerprint database comprises hash pairs of the audio files and corresponding primary hash keys;
and the secondary audio fingerprint library establishing module is used for establishing a secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
In one embodiment, the secondary audio fingerprint repository establishing module is further configured to:
grouping the hash pairs of the audio files to obtain hash pair groups of the audio files;
and calculating a secondary hash key corresponding to the hash pair grouping of each audio file according to the primary hash key of the hash pair in the hash pair grouping of each audio file to obtain a secondary audio fingerprint library containing the hash pair grouping of each audio file and the corresponding secondary hash key.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the audio searching method, the device, the computer equipment and the computer readable storage medium, after the primary audio fingerprints of the audio segments are obtained, the primary audio fingerprints comprise hash pairs and corresponding primary hash keys, the hash pairs are grouped according to the preset window length to obtain the hash pair groups, the secondary hash keys corresponding to the hash pair groups respectively are calculated to obtain the secondary audio fingerprints of the audio segments, the secondary audio fingerprints comprise the hash pair groups and the corresponding secondary hash keys, the secondary audio fingerprints matched with the secondary audio fingerprints of the audio segments are inquired in the secondary audio fingerprint library, the audio files corresponding to the matched secondary audio fingerprints are output, accurate matching of the audio fingerprints of the audio segments with a large number of candidate audio fingerprints is avoided, and therefore the efficiency of audio searching is improved.
Drawings
FIG. 1 is a diagram of an exemplary audio search application environment;
FIG. 2 is a flow diagram illustrating an audio search method according to one embodiment;
FIG. 2A is a graph of spectral features extracted in one embodiment;
FIG. 2B is a spectral star map extracted in one embodiment;
FIG. 2C is a diagram of a combined hash pair, according to an embodiment;
FIG. 2D is a schematic diagram of a hash pair in one embodiment;
FIG. 3 is a diagram of a hash-on-packet in one embodiment;
FIG. 4 is a flowchart illustrating the steps of constructing an audio fingerprint library according to one embodiment;
FIG. 5 is a flowchart illustrating an audio search method according to another embodiment;
FIG. 6 is a block diagram showing the structure of an audio search apparatus according to an embodiment;
FIG. 7 is a block diagram showing the construction of an audio search apparatus according to another embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device in one embodiment;
fig. 9 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The audio searching method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. In one embodiment, the terminal 102 obtains a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment; grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs; calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of the hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprints comprise all the Hash pair groups and corresponding secondary Hash keys; after the terminal 102 obtains the secondary audio fingerprint of the audio clip, the secondary audio fingerprint is sent to the server 104, the server 104 queries the secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library, outputs an audio file corresponding to the matched secondary audio fingerprint, and sends the audio file to the terminal 102 for display.
In another example, after the terminal 102 acquires the audio clip, the audio clip is sent to the server 104, and the server 104 acquires a primary audio fingerprint of the audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment; grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs; calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of the hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprints comprise all the Hash pair groups and corresponding secondary Hash keys; and inquiring the secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in the secondary audio fingerprint library, outputting an audio file corresponding to the matched secondary audio fingerprint, and sending the audio file to the terminal 102 for displaying.
In addition, the audio search method may also be applied to a single terminal 102 or the server 104, taking the application to the terminal 102 as an example, the terminal 102 obtains a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment; grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs; calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of the hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprints comprise all the Hash pair groups and corresponding secondary Hash keys; and inquiring the secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library, and outputting an audio file corresponding to the matched secondary audio fingerprint.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, an audio searching method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
s202, acquiring a primary audio fingerprint of the audio clip.
The audio clip is an unknown audio signal to be subjected to audio search, and specifically may be a complete audio signal, or may be a part of audio signal intercepted from the complete audio signal, where the complete audio signal may be an audio signal acquired by a sound acquisition device or an audio signal downloaded from a network. The audio fingerprint refers to a section of digital abstract extracted from an audio signal through a specific algorithm, and the audio fingerprint can represent a compact digital signature of a section of audio important acoustic features. The audio frequency fingerprint that contains different information contents can be extracted from the audio frequency fragment in this application, and the information content that the audio frequency fingerprint contains influences the speed of audio frequency search, and the information content that the audio frequency fingerprint contains leads to redundant loaded down with trivial details matching when the audio frequency search too little to extravagant a large amount of computing resources influence the speed of audio frequency search. The information content contained in the audio fingerprint is represented by corresponding levels, such as a primary audio fingerprint, a secondary audio fingerprint and a tertiary audio fingerprint, and the relationship among the information content contained in the audio fingerprint is that the primary audio fingerprint is smaller than the secondary audio fingerprint and the secondary audio fingerprint is smaller than the tertiary audio fingerprint. The first-level audio fingerprint comprises a hash pair and a corresponding first-level hash key, wherein the hash pair is a combination of two spectrum peak points of an audio fragment.
In one embodiment, after acquiring an audio segment needing to be searched, a terminal extracts a spectral feature of the audio segment, determines a spectral peak point in the spectral feature, and then constructs a primary audio fingerprint of the audio segment according to the determined spectral peak point. Wherein, constructing a primary audio fingerprint of the audio clip according to the determined spectral peak point specifically comprises: and combining the spectrum peak points according to a combined hash method to obtain hash pairs, wherein each hash pair is the combination of two spectrum peak points, each hash pair is respectively corresponding to one hash key which is called a primary hash key, the primary hash key of each hash pair is determined according to the frequency of the two spectrum peak points corresponding to the hash pair and the time offset of the two spectrum peak points on the audio segment, and the time offset is the time difference relative to the initial position of the audio segment. For example, hash pair A corresponds to spectral peak point 1 and spectral peak point 2, where spectral peak point 1 has a frequency off 1The time offset on the audio segment ist 1The frequency of the spectral peak point 2 isf 2The time offset on the audio segment ist 2Then according tof 1f 2t 1Andt 2a first-order hash key of hash pair a may be determined.
S204, grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet contains at least two hash pairs.
The preset window length may be a first preset window length, and the first preset window length is a difference value of numbers of corresponding hash pairs in two adjacent hash pair groups obtained when the hash pairs are grouped. For example, the hash pair 1 to the hash pair 10 are grouped to obtain a hash pair group 1 and a hash pair group 2, where the hash pair group 1 includes a hash pair 1 to a hash pair 5, and the hash pair group 2 includes a hash pair 6 to a hash pair 10, then the hash pair 1 to the hash pair 5 in the hash pair group 1 respectively correspond to the hash pair 6 to the hash pair 10 in the hash pair group 2, and then the difference 5 between the numbers of the hash pair 6 and the hash pair 1 is the preset window length adopted in grouping.
When the audio segment to be searched is a complete audio signal, for example, when a complete music audio is searched, the preset window length may also be a second preset window length, where the second preset window length is a difference between time intervals corresponding to two adjacent hash pairs grouped when the hash pairs are grouped. For example, the total time length of an audio segment is 10 seconds, the audio segment corresponds to 20 hash pairs, the 20 hash pairs are grouped to obtain a hash pair packet a and a hash pair packet B, the hash pair packet a includes 8 hash pairs, the hash pair packet B includes 12 hash pairs, the time offset of the 8 hash pairs included in the hash pair packet a in the audio segment is between 0 second and 10 seconds, that is, the time interval corresponding to the hash pair packet a is between 0 second and 10 seconds, the time offset of the 12 hash pairs included in the hash pair packet a in the audio segment is between 10 seconds and 20 seconds, that is, the time interval corresponding to the hash pair packet B is between 10 seconds and 20 seconds, and the difference between the hash pair packet B and the hash pair packet a, which is 10 seconds, is the preset window length adopted when the audio segment is grouped.
In one embodiment, after obtaining the primary audio fingerprints of the audio segments, the terminal sorts the hash pairs in the primary audio fingerprints according to a time sequence, and groups the sorted hash pairs according to a preset window length to obtain at least two hash pair groups. The sorting according to the time sequence is specifically that the time offsets corresponding to the hash pairs in the primary audio fingerprints are sorted from small to large, and the time offset corresponding to the hash pair refers to an anchor point (one of two spectral peak points) selected when the hash pair is constructed and relative to the audio frequencyTime offset of the start position of the segment. For example, the hash pair a corresponds to a spectrum peak point 1 and a spectrum peak point 2, the spectrum peak point 1 is an anchor point selected when the hash pair a is constructed, and the time offset of the spectrum peak point 1 on the audio segment ist 1The time offset of spectral peak point 2 on the audio segment ist 2Then the time offset of the hash pair A is determinedT 1=t 1(ii) a The time offsets of the hash pair A, the hash pair B, the hash pair C and the hash pair D are respectivelyT 1T 2T 3AndT 4and is andT 3T 1T 4T 2and sorting the Hash pair A, the Hash pair B, the Hash pair C and the Hash pair D according to a time sequence, wherein the sorting result is 'Hash pair C, Hash pair A, Hash pair D and Hash pair B'.
And S206, calculating the secondary hash keys corresponding to the hash pairs in the hash pair groups according to the primary hash keys of the hash pairs in the hash pair groups to obtain the secondary audio fingerprints of the audio segments.
The second-level audio fingerprint comprises hash pair groups and corresponding second-level hash keys, the hash groups comprise at least two hash pairs, and the second-level audio fingerprint constructed by the hash pair groups contains more information quantity relative to the first-level audio fingerprint.
In one embodiment, after grouping the hash pairs, the terminal respectively calculates a secondary hash key corresponding to each hash pair group to obtain a secondary audio fingerprint of the audio segment, wherein the process of calculating the secondary hash key corresponding to each hash pair group is as follows: aiming at any one Hash pair group, acquiring a first-level Hash key of each Hash pair in the Hash pair group, substituting the acquired first-level Hash key into a Hash formula to calculate, and acquiring a second-level Hash key corresponding to the Hash pair group, wherein the Hash formula is as follows:
Figure 500388DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 460516DEST_PATH_IMAGE002
is as follows
Figure 537057DEST_PATH_IMAGE003
Each hash pair is associated with a second-level hash key,
Figure 74480DEST_PATH_IMAGE004
is as follows
Figure 465141DEST_PATH_IMAGE003
The number of hash pairs that a packet contains,
Figure 263464DEST_PATH_IMAGE005
is as follows
Figure 369960DEST_PATH_IMAGE003
The first of the hash pair packets
Figure 83094DEST_PATH_IMAGE006
The first-order hash key of each hash pair,
Figure 824917DEST_PATH_IMAGE007
is as follows
Figure 992593DEST_PATH_IMAGE006
The distinguishing factors are used for ensuring that the second-level hash keys of different hash pair packets are different. In particular, the amount of the solvent to be used,
Figure 83040DEST_PATH_IMAGE007
can be prime numbers, accordingly
Figure 922951DEST_PATH_IMAGE007
Can be generated by a prime number list.
And S208, searching a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in the secondary audio fingerprint database, and outputting an audio file corresponding to the matched secondary audio fingerprint.
The second-level audio fingerprint database can be a second-level audio fingerprint hash table, and the second-level audio fingerprint hash table comprises a second-level hash key and a corresponding second-level hash key. The output audio file may be the audio file itself and/or audio file attribute information, for example, the audio file itself may be a song, and the audio file attribute information may be at least one of a song number, a song name, and a singer.
In one embodiment, after the terminal obtains the secondary audio fingerprint of the audio clip, the terminal searches the grouping and the corresponding secondary hash key in the secondary audio fingerprint library according to the hashes contained in the secondary audio fingerprint of the audio clip to obtain a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip, and outputs an audio file corresponding to the matched secondary audio fingerprint.
In one embodiment, searching the packets and the corresponding secondary hash keys in the secondary audio fingerprint library according to the hashes included in the secondary audio fingerprints of the audio pieces, and obtaining the secondary audio fingerprint matching the secondary audio fingerprint of the audio piece includes: the grouping and the corresponding secondary hash keys are searched in a secondary audio fingerprint library according to the hashes contained in the secondary audio fingerprints of the audio clips to obtain secondary audio fingerprints matched with the secondary audio fingerprints of the audio clips, specifically, candidate secondary audio fingerprints which are repeated with the secondary hash keys contained in the secondary audio fingerprints of the audio clips can be searched in the secondary audio fingerprint hash table, and the secondary audio fingerprints matched with the secondary audio fingerprints of the audio clips are selected from the candidate secondary audio fingerprints.
For example, the candidate secondary audio fingerprints found by the secondary audio fingerprints of the audio clip include candidate secondary audio fingerprints a, candidate secondary audio fingerprints B, and candidate secondary audio fingerprints C with secondary hash keys 00001, 00002, 00003, 00004, 00005, and 00006, where the candidate secondary audio fingerprint a includes the secondary hash key 00002, the candidate secondary audio fingerprint B includes the secondary hash keys 00002, 00003, 00004, and 00005, and the candidate secondary audio fingerprint C includes the secondary hash keys 00001, 00002, 00003, 00005, and 0006, the repetition number of the candidate secondary audio fingerprint a and the secondary audio fingerprint of the audio clip is 1, the corresponding repetition rate is 16%, the repetition number of the candidate secondary audio fingerprint B and the secondary audio fingerprint of the audio clip is 4, the corresponding repetition rate is 67%, the repetition number of the candidate secondary audio fingerprint C and the secondary audio fingerprint of the audio clip is 5, the corresponding repetition rate is 83%, then the secondary audio fingerprint with repetition rate higher than the threshold value of repetition rate in the candidate secondary audio fingerprints may be determined as the secondary audio fingerprint matching the secondary audio fingerprint of the audio clip, if the threshold value of repetition rate is 80%, then the candidate secondary audio fingerprint C is determined as the secondary audio fingerprint matching the secondary audio fingerprint of the audio clip, and if the threshold value of repetition rate is 60%, then the candidate secondary audio fingerprint B and the candidate secondary audio fingerprint C are determined as the secondary audio fingerprint matching the secondary audio fingerprint of the audio clip.
In the above embodiment, after the terminal acquires the first-level audio fingerprint of the audio clip, where the first-level audio fingerprint includes hash pairs and corresponding first-level hash keys, the hash pairs are grouped according to the preset window length to obtain hash pair groups, and second-level hash keys corresponding to the hash pairs are calculated, so as to obtain second-level audio fingerprints of the audio clip, where the second-level audio fingerprints include the hash pairs and corresponding second-level hash keys, in a second-level audio fingerprint library, a second-level audio fingerprint matched with the second-level audio fingerprint of the audio clip is queried, and an audio file corresponding to the matched second-level audio fingerprint is output, thereby avoiding accurate matching of the audio fingerprint of the audio clip with a large number of candidate audio fingerprints, and improving the efficiency of audio search.
In one embodiment, S202 specifically includes the following steps: extracting the spectral features of the audio segments, determining spectral peak points of the spectral features, constructing hash pairs according to the spectral peak points, calculating primary hash keys corresponding to the hash pairs according to the first frequency and the first time of the first spectral peak point corresponding to the hash pair and the second frequency and the second time of the second spectral peak point corresponding to the hash pair, and obtaining primary audio fingerprints of the audio segments containing the hash pairs and the corresponding primary hash keys. Specifically, the time difference between the first time and the second time may be calculated first, and the time difference is spliced with the first frequency and the second frequency, and the splicing result is used as the first-level hash key corresponding to each hash pair.
The above-described embodiment is explained as an example. As shown in fig. 2A, for a spectral feature map obtained after extracting spectral features of an audio segment, a spectral peak point is determined according to the spectral feature map, each point in a star spectrogram shown in fig. 2B is the spectral peak point determined by fig. 2A, and a hash pair is constructed according to each spectral peak point in the star spectrogram, as shown in fig. 2C, an anchor point (anchor point) is first selected, and each anchor point corresponds to a target zone (target zone). Each anchor point is combined with the point in the target area in sequence, and the combined result is to form a hash pair (fig. 2D), the hash key corresponding to each hash pair is determined according to the frequency of the two spectrum peak points corresponding to the hash pair and the time difference between the two spectrum peak points, specifically, for each hash pair, the selected anchor point is determined as the time offset of the hash pair in the audio segment, the frequency of the two spectrum peak points and the time difference between the two spectrum peak points are spliced, and the spliced result is used as the hash key corresponding to the hash pair.
In the above embodiment, the terminal determines the spectral peak point of the spectral feature by extracting the spectral feature of the audio segment; constructing a hash pair according to the frequency spectrum peak point; and calculating a primary hash key corresponding to each hash pair according to the frequency and time of the spectral peak point corresponding to the hash pair, and obtaining a primary audio fingerprint of the audio segment containing each hash pair and the corresponding primary hash key, so that the information content of the primary audio fingerprint is redundant to the information content contained by a single peak point, and the secondary audio fingerprint constructed according to the primary audio fingerprint contains more information content, so that when the secondary audio fingerprint constructed according to the primary audio fingerprint is searched in a secondary audio fingerprint library, redundant matching can be reduced, the calculation resources are saved, and the efficiency of audio search is improved.
In an embodiment, when the audio segment to be searched is a complete audio signal, for example, when a complete music audio is searched, the terminal may further group the hash pairs according to a preset window length and a preset step length to obtain hash pair groups. The preset window length may be a window length of a preset data window or a window length of a preset time window, the preset data window corresponds to a first preset step length, and the preset time window corresponds to a second preset step length.
For example, the hash pair 1 to the hash pair 10 are grouped, the preset window length is the window length of the preset data window, the window length is 4, and the first preset step is 2, then the grouping result is that the hash pair packet 1 includes "hash pair 1 to hash pair 4", the hash pair packet 2 includes "hash pair 3 to hash pair 6", the hash pair packet 3 includes "hash pair 4 to hash pair 8", and the hash pair packet 4 includes "hash pair 6 to hash pair 10".
Fig. 3 shows an embodiment in which hash pairs are grouped according to a window length of a preset time window and a preset step length, where a shaded rectangular portion in the diagram indicates a time window with a certain length, each hash pair included in the time window is a hash pair included in one of the hash pair groups after grouping, and the preset step length may be selected according to actual needs, for example, if the second preset step length is the same as the window length of the preset time window, the obtained adjacent hash pair groups do not include the same hash pair group; if the second preset step is smaller than the window length of the preset time window, the obtained adjacent hash pair packets may contain the same hash pair packet.
In the above embodiment, when the audio segment to be searched is a complete audio signal, the terminal may further group the hash pairs according to the preset window length and the preset step length to obtain the hash pair group, so that the adjacent hash pairs may include the same hash pair, and further, the secondary audio fingerprint constructed by grouping according to the hash pair has more information amount, so that when the secondary audio fingerprint constructed by grouping is searched in the secondary audio fingerprint library according to the constructed secondary audio fingerprint, redundant matching may be reduced, thereby saving calculation resources and improving the efficiency of audio search.
In one embodiment, after the terminal searches a secondary audio fingerprint matching a secondary audio fingerprint of the audio clip in the secondary audio fingerprint library and outputs an audio file corresponding to the matching secondary audio fingerprint, the audio searching method further includes the following steps: and inquiring the primary audio fingerprints of the audio files in a primary audio fingerprint library, inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints. The first-level audio fingerprint library can be a first-level audio fingerprint hash table, and the first-level fingerprint hash table comprises a first-level hash key and a corresponding first-level hash key.
In one embodiment, the process of the terminal querying the primary audio fingerprint matched with the primary audio fingerprint of the audio clip from the primary audio fingerprints of the audio files is as follows: searching in the primary audio fingerprints of each audio file according to the hash pair contained in the primary audio fingerprint of the audio clip and the corresponding primary hash key, specifically searching for candidate primary audio fingerprints having duplication with the primary hash key contained in the primary audio fingerprint of the audio clip in the primary audio fingerprints of each audio file, and determining a difference in time offset between hash pairs (hash pairs with the same primary hash key) corresponding to the primary audio fingerprints of the audio segment in each candidate primary fingerprint, and counts the same number of differences corresponding to the time offsets in each candidate primary fingerprint through the time difference histogram, and determining the candidate primary fingerprints with the same number and the maximum number corresponding to the time offset difference as the primary audio fingerprints matched with the primary audio fingerprints of the audio clip, and outputting the audio files corresponding to the matched primary audio fingerprints.
For example, if the first-level hash key a corresponding to the hash pair a in the candidate first-level fingerprint a is the same as the first-level hash key B corresponding to the hash pair B in the first-level audio fingerprint of the audio segment, that is, the first-level hash key a is the same as the first-level hash key B, the hash pair a corresponds to the hash pair B, and the time offset of the hash pair a is equal toT 1The hash is offset in time from BT 2The difference in the time offset between hash pair A and hash pair B isT 2-T 1=QIf candidate first-level fingerprint A and one of the audio segmentThe differences in the time offsets between a plurality of corresponding hash pairs between the class audio fingerprints are allQThen, the time difference histogram is used to count the corresponding candidate first-level fingerprint AQWhen the candidate first-level fingerprint A corresponds toQWhen the number of the candidate first-level fingerprints is more than the same number of the time offset differences corresponding to other candidate first-level fingerprints, determining the candidate first-level fingerprint A as a first-level audio fingerprint matched with the first-level audio fingerprint of the audio clip, and outputting an audio file A corresponding to the candidate first-level fingerprint A.
In the above embodiment, after the terminal outputs the audio file corresponding to the matched secondary audio fingerprint, the terminal queries the primary audio fingerprint of the audio file through being in the primary audio fingerprint library, and queries the primary audio fingerprint matched with the primary audio fingerprint of the audio clip from the primary audio fingerprints of the audio files, and outputs the audio file corresponding to the matched primary audio fingerprint, so that the audio clip can be searched based on the secondary audio fingerprint and the primary audio fingerprint of the audio clip, the audio files with a small number can be quickly searched through the secondary audio fingerprint search, then the audio files with a small number are accurately matched through the primary audio fingerprint search, and the efficient and accurate search of the audio clip is realized.
In one embodiment, after the terminal calculates the second-level hash keys corresponding to the hash pairs in the hash pair groups according to the first-level hash keys of the hash pairs in the hash pair groups, the terminal can also perform regrouping on the hash pair groups to obtain multi-level hash pair groups; the level of the multi-level hash pair packet is at least three; calculating the multistage hash keys corresponding to the multistage hash groups according to the two-stage hash keys of the hash pair groups contained in the multistage hash pair groups to obtain multistage audio fingerprints of the audio segments; the multi-level audio fingerprint comprises each multi-level hash group and each corresponding multi-level hash key; and inquiring the multistage audio fingerprints matched with the multistage audio fingerprints of the audio clips in the multistage audio fingerprint library, and outputting audio files corresponding to the matched multistage audio fingerprints. Wherein the levels of the multilevel hash pair grouping, the multilevel hash key, the multilevel audio fingerprint, and the multilevel audio fingerprint repository correspond, e.g., the four-level hash pair grouping, the four-level hash key, the four-level audio fingerprint, and the four-level audio fingerprint repository.
In one embodiment, after obtaining each N-1 level hash pair packet and the corresponding N-1 level hash key, the terminal can also regroup the N-1 level hash pair packets to obtain N level hash pair packets; calculating N-level hash keys corresponding to the N-level hash groups according to the N-1-level hash keys of the N-level hash group to obtain N-level audio fingerprints of the audio segments; the N-level audio fingerprints comprise N-level hash groups and corresponding N-level hash keys; and inquiring the N-level audio fingerprints matched with the N-level audio fingerprints of the audio clips in the N-level audio fingerprint database, and outputting audio files corresponding to the matched N-level audio fingerprints. Wherein N is greater than or equal to 3.
In the above embodiment, the terminal regroups the packets by hashing, and then obtains the multi-level audio fingerprints of the audio segments, and in the multi-level audio fingerprint library, queries the multi-level audio fingerprints matched with the multi-level audio fingerprints of the audio segments, and outputs the audio files corresponding to the matched multi-level audio fingerprints, so that the audio fingerprints at the appropriate level of the audio segments can be extracted according to the speed and precision requirements of audio search, and search is performed in the audio fingerprint library at the corresponding level, thereby avoiding accurately matching the audio fingerprints of the audio segments with a large number of candidate audio fingerprints, and further improving the efficiency of audio search.
In one embodiment, before the terminal searches for a secondary audio fingerprint matching the secondary audio fingerprint of the audio clip in the secondary audio fingerprint library, it needs to establish an audio fingerprint library in advance, as shown in fig. 4, the process of establishing the audio fingerprint library includes the following steps:
s402, acquiring an audio signal of the audio file.
S404, establishing a primary audio fingerprint database containing audio files according to the frequency spectrum peak point of the audio signal; the primary audio fingerprint library comprises hash pairs of the audio files and corresponding primary hash keys.
Wherein the audio files are a plurality of audio files used to generate an audio fingerprint library. The first-level audio fingerprint database contains the first-level audio fingerprints of the acquired audio files, and the first-level audio fingerprint database can be a first-level audio fingerprint hash table.
In one embodiment, after the terminal acquires the audio signals of the audio files, the terminal extracts the spectral features of the audio signals, determines spectral peak points of the spectral features, constructs hash pairs according to the spectral peak points, calculates primary hash keys corresponding to the hash pairs according to the first frequency and the first time of the first spectral peak point corresponding to the hash pair and the second frequency and the second time of the second spectral peak point corresponding to the hash pair, obtains primary audio fingerprints of the audio files containing the hash pairs and the corresponding primary hash keys, and constructs a primary audio fingerprint library containing the audio files according to the primary audio fingerprints of the audio files.
Specifically, for each hash pair, determining the selected anchor point as the time offset of the hash pair in the audio segment, splicing the frequencies of the two spectrum peak points and the time difference between the two spectrum peak points, and using the splicing result as a hash key corresponding to the hash pair. Each sub-fingerprint may store a data structure in which the number of bits of the primary hash key may be
Figure DEST_PATH_IMAGE009
A bit wherein
Figure 514469DEST_PATH_IMAGE010
For the number of bits occupied by the frequency of each spectral peak point in the hash pair,
Figure DEST_PATH_IMAGE011
the bit number occupied by the time difference of two spectrum peak points in the hash pair.
S406, establishing a secondary audio fingerprint database containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
The second-level audio fingerprint library comprises second-level audio fingerprints of the acquired audio files, and the second-level audio fingerprint library can be specifically a second-level audio fingerprint hash table.
In one embodiment, after obtaining the first-level audio fingerprints of the audio files, the terminal groups the hash pairs of the first-level audio fingerprints of the audio files according to a preset window length to obtain hash pair groups corresponding to the audio files, then calculates second-level hash keys corresponding to the hash pairs according to the first-level hash keys of the hash pairs in the hash pair groups to obtain second-level audio fingerprints of the audio files, and constructs a second-level audio fingerprint library of the audio files according to the second-level audio fingerprints of the audio files, wherein the second-level audio fingerprints of the audio files comprise the hash pair groups and the second-level hash keys corresponding to the audio files.
In the above embodiment, the terminal establishes the primary audio fingerprint library containing the audio files according to the peak points of the frequency spectrums of the audio files by acquiring the audio signals of the audio files, the primary audio fingerprint library contains hash pairs of the audio files and corresponding primary hash keys, and establishes the secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys, so that the corresponding fingerprint library is established in advance according to audio search requirements, a search is performed in the corresponding audio fingerprint library according to the level of the audio fingerprints when the audio search is performed, the audio fingerprints of audio segments are prevented from being accurately matched with a large number of candidate audio fingerprints, and the efficiency of the audio search is improved.
In one embodiment, after the terminal establishes a secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys, the terminal can also regroup the hash pairs of the audio files to obtain multi-level hash pair groups of the audio files, and the level of the multi-level hash pair groups is at least three levels; and then, calculating the multistage hash keys corresponding to the multistage hash groups of the audio files according to the two-stage hash keys of the multistage hash groups of the audio files, and establishing a multistage audio fingerprint library containing the audio files according to the multistage hash groups of the audio files and the corresponding multistage hash keys. Wherein the levels of the multi-level hash pair grouping, the multi-level hash key, the multi-level audio fingerprint, and the multi-level audio fingerprint repository correspond, e.g., the four-level hash pair grouping, the four-level hash key, the four-level audio fingerprint, and the four-level audio fingerprint repository.
In one embodiment, after the terminal establishes the N-1 level audio fingerprint database containing the audio files, the N-1 level hash pairs of the audio files can be regrouped to obtain N level hash pair groups of the audio files; and then calculating N-level hash keys corresponding to the N-level hash groups of the audio files according to the N-1-level hash keys of the N-level hash groups of the audio files, and establishing an N-level audio fingerprint library containing the audio files according to the N-level hash groups of the audio files and the corresponding N-level hash keys.
In the above embodiment, the terminal regroups the packets by hashing of each audio file to construct the multistage audio fingerprint database, so that the audio files are searched in the corresponding multistage audio fingerprint database according to the grade of the audio fingerprint when audio search is performed, thereby avoiding accurate matching of the audio fingerprints of the audio clip with a large number of candidate audio fingerprints, and improving the efficiency of audio search.
In one embodiment, as shown in fig. 5, an audio searching method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
s502, acquiring an audio signal of the audio file.
S504, a primary audio fingerprint database containing audio files is established according to the frequency spectrum peak point of the audio signal. The primary audio fingerprint library comprises hash pairs of the audio files and corresponding primary hash keys.
S506, establishing a secondary audio fingerprint database containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
And S508, acquiring a primary audio fingerprint of the audio clip. The primary audio fingerprint comprises a hash pair and a corresponding primary hash key, wherein the hash pair is the combination of two spectrum peak points of the audio segment.
And S510, grouping the hash pairs according to a preset window length to obtain hash pair groups. The hash pair packet contains at least two hash pairs.
And S512, calculating the secondary hash keys corresponding to the hash pairs in the hash pair groups according to the primary hash keys of the hash pairs in the hash pair groups to obtain the secondary audio fingerprints of the audio segments. The secondary audio fingerprint includes each hash pair packet and a corresponding secondary hash key.
And S514, searching the secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in the secondary audio fingerprint database, and outputting the audio file corresponding to the matched secondary audio fingerprint.
S516, inquiring the primary audio fingerprint of the audio file in the primary audio fingerprint library.
S518, inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints.
It should be understood that although the various steps in the flowcharts of fig. 2, 4 and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4 and 5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 6, there is provided an audio search apparatus including: a primary audio fingerprint acquisition module 602, a grouping module 604, a secondary audio fingerprint acquisition module 606, and an audio query module 608, wherein:
a primary audio fingerprint obtaining module 602, configured to obtain a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment;
a grouping module 604, configured to group hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs;
a secondary audio fingerprint obtaining module 606, configured to calculate, according to a primary hash key of a hash pair in the hash pair group, a secondary hash key corresponding to each hash pair group, so as to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprints comprise all the Hash pair groups and corresponding secondary Hash keys;
the audio query module 608 queries a secondary audio fingerprint matching the secondary audio fingerprint of the audio clip in the secondary audio fingerprint library, and outputs an audio file corresponding to the matching secondary audio fingerprint.
In one embodiment, the secondary audio fingerprint acquisition module 606 is further configured to:
acquiring a first-level hash key of each hash pair in the hash pair grouping;
substituting the obtained primary hash key into a hash formula to calculate to obtain a secondary hash key corresponding to each hash pair; the hash formula is:
Figure 818543DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 650249DEST_PATH_IMAGE002
is as follows
Figure 305353DEST_PATH_IMAGE003
Each hash pair is associated with a second-level hash key,
Figure 497299DEST_PATH_IMAGE004
is as follows
Figure 655879DEST_PATH_IMAGE003
The number of hash pairs that a packet contains,
Figure 291391DEST_PATH_IMAGE005
is as follows
Figure 417479DEST_PATH_IMAGE003
The first of the hash pair packets
Figure 429429DEST_PATH_IMAGE006
The first-order hash key of each hash pair,
Figure 895045DEST_PATH_IMAGE007
is as follows
Figure 498196DEST_PATH_IMAGE006
And the hash pairs correspond to the distinguishing factors.
In the above embodiment, after the terminal acquires the first-level audio fingerprint of the audio clip, where the first-level audio fingerprint includes hash pairs and corresponding first-level hash keys, the hash pairs are grouped according to the preset window length to obtain hash pair groups, and second-level hash keys corresponding to the hash pairs are calculated, so as to obtain second-level audio fingerprints of the audio clip, where the second-level audio fingerprints include the hash pairs and corresponding second-level hash keys, in a second-level audio fingerprint library, a second-level audio fingerprint matched with the second-level audio fingerprint of the audio clip is queried, and an audio file corresponding to the matched second-level audio fingerprint is output, thereby avoiding accurate matching of the audio fingerprint of the audio clip with a large number of candidate audio fingerprints, and improving the efficiency of audio search.
In one embodiment, the primary audio fingerprint acquisition module 602 is further configured to:
extracting the spectral characteristics of the audio segments and determining the spectral peak points of the spectral characteristics;
constructing a hash pair according to the frequency spectrum peak point;
and calculating a primary hash key corresponding to each hash pair according to the first frequency and the first time of the first spectrum peak point corresponding to the hash pair and the second frequency and the second time of the second spectrum peak point corresponding to the hash pair, and obtaining a primary audio fingerprint of the audio segment comprising each hash pair and the corresponding primary hash key.
In the above embodiment, the terminal determines the spectral peak point of the spectral feature by extracting the spectral feature of the audio segment; constructing a hash pair according to the frequency spectrum peak point; the method comprises the steps of calculating the primary hash key corresponding to each hash pair according to the frequency and time of the corresponding frequency spectrum peak point of the hash pair, and obtaining the primary audio fingerprint of the audio fragment containing each hash pair and the corresponding primary hash key, so that the information content of the primary audio fingerprint is redundant to the information content contained by a single peak point, the secondary audio fingerprint constructed according to the primary audio fingerprint contains more information content, and when the secondary audio fingerprint constructed according to the primary audio fingerprint is searched in a secondary audio fingerprint library, redundant matching can be reduced, so that the calculation resources are saved, and the efficiency of audio searching is improved.
In one embodiment, the audio query module 608 is further configured to:
inquiring a primary audio fingerprint of an audio file in a primary audio fingerprint library;
and inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints.
In the above embodiment, after the terminal outputs the audio file corresponding to the matched secondary audio fingerprint, the terminal queries the primary audio fingerprint of the audio file through being in the primary audio fingerprint library, and queries the primary audio fingerprint matched with the primary audio fingerprint of the audio clip from the primary audio fingerprints of the audio files, and outputs the audio file corresponding to the matched primary audio fingerprint, so that the audio clip can be searched based on the secondary audio fingerprint and the primary audio fingerprint of the audio clip, the audio files with a small number can be quickly searched through the secondary audio fingerprint search, then the audio files with a small number are accurately matched through the primary audio fingerprint search, and the efficient and accurate search of the audio clip is realized.
In one embodiment, the grouping module 604 is further configured to:
regrouping the Hash pair groups to obtain multi-level Hash pair groups; the level of the multi-level hash pair packet is at least three;
the secondary audio fingerprint obtaining module 606 is further configured to calculate a multi-level hash key corresponding to each multi-level hash packet according to a secondary hash key of each hash pair packet included in the multi-level hash pair packet, and obtain a multi-level audio fingerprint of an audio segment; the multi-level audio fingerprint comprises each multi-level hash group and each corresponding multi-level hash key;
the audio query module 608 is further configured to query, in the multi-level audio fingerprint library, a multi-level audio fingerprint matching the multi-level audio fingerprint of the audio clip, and output an audio file corresponding to the matching multi-level audio fingerprint.
In the above embodiment, the terminal regroups the packets by hashing, and then obtains the multi-level audio fingerprints of the audio segments, and in the multi-level audio fingerprint library, queries the multi-level audio fingerprints matched with the multi-level audio fingerprints of the audio segments, and outputs the audio files corresponding to the matched multi-level audio fingerprints, so that the audio fingerprints at the appropriate level of the audio segments can be extracted according to the speed and precision requirements of audio search, and search is performed in the audio fingerprint library at the corresponding level, thereby avoiding accurately matching the audio fingerprints of the audio segments with a large number of candidate audio fingerprints, and further improving the efficiency of audio search.
In one embodiment, as shown in fig. 7, the apparatus further comprises: an audio file acquisition module 610, a primary audio fingerprint database establishment module 612, and a secondary audio fingerprint database establishment module 614, wherein:
the audio file acquisition module 610 acquires an audio signal of an audio file;
a primary audio fingerprint database establishing module 612, configured to establish a primary audio fingerprint database containing audio files according to a peak point of a frequency spectrum of an audio signal; the primary audio fingerprint database comprises hash pairs of the audio files and corresponding primary hash keys;
and a secondary audio fingerprint library establishing module 614, configured to establish a secondary audio fingerprint library including the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
In the above embodiment, the terminal establishes the primary audio fingerprint library containing the audio files according to the peak points of the frequency spectrums of the audio files by acquiring the audio signals of the audio files, the primary audio fingerprint library contains hash pairs of the audio files and corresponding primary hash keys, and establishes the secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys, so that the corresponding fingerprint library is established in advance according to audio search requirements, a search is performed in the corresponding audio fingerprint library according to the level of the audio fingerprints when the audio search is performed, the audio fingerprints of audio segments are prevented from being accurately matched with a large number of candidate audio fingerprints, and the efficiency of the audio search is improved.
In one embodiment, the secondary audio fingerprint repository establishing module 614 is further configured to:
grouping the hash pairs of the audio files to obtain hash pair groups of the audio files;
and calculating the second-level hash keys respectively corresponding to the hash pair groups of the audio files according to the first-level hash keys of the hash pairs in the hash pair groups of the audio files to obtain a second-level audio fingerprint library containing the hash pair groups of the audio files and the corresponding second-level hash keys.
In the above embodiment, the terminal regroups the packets by hashing of each audio file to construct the multistage audio fingerprint database, so that the audio files are searched in the corresponding multistage audio fingerprint database according to the grade of the audio fingerprint when audio search is performed, thereby avoiding accurate matching of the audio fingerprints of the audio clip with a large number of candidate audio fingerprints, and improving the efficiency of audio search.
For the specific limitations of the audio search device, reference may be made to the above limitations of the audio search method, which are not described herein again. The modules in the audio search device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing audio fingerprint data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an audio search method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio search method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the configurations shown in fig. 8 or 9 are merely block diagrams of some configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An audio search method, the method comprising:
acquiring a primary audio fingerprint of an audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment;
grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs;
calculating a secondary hash key corresponding to each hash pair group according to a primary hash key of a hash pair in the hash pair group to obtain a secondary audio fingerprint of the audio segment; the secondary audio fingerprint comprises each of the hash pair packets and each of the corresponding secondary hash keys;
and searching a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library, and outputting an audio file corresponding to the matched secondary audio fingerprint.
2. The method of claim 1, wherein obtaining a primary audio fingerprint of an audio segment comprises:
extracting the spectral feature of the audio clip, and determining the spectral peak point of the spectral feature;
constructing a hash pair according to the frequency spectrum peak point;
and calculating a primary hash key corresponding to each hash pair according to the first frequency and the first time of the first spectrum peak point corresponding to the hash pair and the second frequency and the second time of the second spectrum peak point corresponding to the hash pair, and obtaining a primary audio fingerprint of the audio segment comprising each hash pair and the corresponding primary hash key.
3. The method according to claim 1, wherein the calculating a secondary hash key corresponding to each hash pair packet according to the primary hash key of the hash pair in the hash pair packet comprises:
acquiring a first-level hash key of each hash pair in the hash pair group;
substituting the obtained primary hash key into a hash formula to calculate to obtain a secondary hash key corresponding to each hash pair group; the hash formula is:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE004
is as follows
Figure DEST_PATH_IMAGE006
Each hash pair is associated with a second-level hash key,
Figure DEST_PATH_IMAGE008
is as follows
Figure 60070DEST_PATH_IMAGE006
The number of hash pairs that a packet contains,
Figure DEST_PATH_IMAGE010
is as follows
Figure 603309DEST_PATH_IMAGE006
The first of the hash pair packets
Figure DEST_PATH_IMAGE012
The first-order hash key of each hash pair,
Figure DEST_PATH_IMAGE014
is as follows
Figure 930517DEST_PATH_IMAGE012
And the hash pairs correspond to the distinguishing factors.
4. The method of claim 1, wherein after the outputting the audio file corresponding to the matched secondary audio fingerprint, the method comprises:
inquiring the primary audio fingerprint of the audio file in a primary audio fingerprint library;
and inquiring the primary audio fingerprints matched with the primary audio fingerprints of the audio clips from the primary audio fingerprints of the audio files, and outputting the audio files corresponding to the matched primary audio fingerprints.
5. The method according to claim 1, wherein after calculating the secondary hash key corresponding to each hash pair packet according to the primary hash key of the hash pair in the hash pair packet, the method further comprises:
regrouping the Hash pair packets to obtain multi-level Hash pair packets; the level of the multi-level hash pair packet is at least three;
calculating the multistage hash keys corresponding to the multistage hash groups according to the two-stage hash keys of the hash pair groups contained in the multistage hash pair groups to obtain the multistage audio fingerprints of the audio segments; the multi-level audio fingerprint comprises each multi-level hash packet and each corresponding multi-level hash key;
and inquiring the multistage audio fingerprints matched with the multistage audio fingerprints of the audio clips in a multistage audio fingerprint library, and outputting audio files corresponding to the matched multistage audio fingerprints.
6. The method of claim 1, wherein prior to searching the secondary audio fingerprint library for a secondary audio fingerprint that matches the secondary audio fingerprint of the audio clip, the method further comprises:
acquiring an audio signal of an audio file;
establishing a primary audio fingerprint database containing the audio file according to the frequency spectrum peak point of the audio signal; the primary audio fingerprint database comprises hash pairs of the audio files and corresponding primary hash keys;
and establishing a secondary audio fingerprint library containing the audio files according to the hash pairs of the audio files and the corresponding primary hash keys.
7. The method of claim 6, wherein the creating a secondary audio fingerprint library containing the audio files according to the hash pair of each audio file and the corresponding primary hash key comprises:
grouping the hash pairs of the audio files to obtain hash pair groups of the audio files;
and calculating a secondary hash key corresponding to the hash pair grouping of each audio file according to the primary hash key of the hash pair in the hash pair grouping of each audio file to obtain a secondary audio fingerprint library containing the hash pair grouping of each audio file and the corresponding secondary hash key.
8. An audio search apparatus, characterized in that the apparatus comprises:
the primary audio fingerprint acquisition module is used for acquiring a primary audio fingerprint of the audio clip; the primary audio fingerprint comprises a hash pair and a corresponding primary hash key, and the hash pair is the combination of two spectrum peak points of the audio segment;
the grouping module is used for grouping the hash pairs according to a preset window length to obtain hash pair groups; the hash pair packet comprises at least two hash pairs;
the second-level audio fingerprint acquisition module is used for calculating a second-level hash key corresponding to each hash pair group according to a first-level hash key of the hash pair in the hash pair group to acquire a second-level audio fingerprint of the audio segment; the secondary audio fingerprint comprises each of the hash pair packets and each of the corresponding secondary hash keys;
and the audio query module is used for querying a secondary audio fingerprint matched with the secondary audio fingerprint of the audio clip in a secondary audio fingerprint library and outputting an audio file corresponding to the matched secondary audio fingerprint.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010707678.9A 2020-07-22 2020-07-22 Audio searching method and device, computer equipment and computer-readable storage medium Active CN111597379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010707678.9A CN111597379B (en) 2020-07-22 2020-07-22 Audio searching method and device, computer equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010707678.9A CN111597379B (en) 2020-07-22 2020-07-22 Audio searching method and device, computer equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN111597379A true CN111597379A (en) 2020-08-28
CN111597379B CN111597379B (en) 2020-11-03

Family

ID=72192411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010707678.9A Active CN111597379B (en) 2020-07-22 2020-07-22 Audio searching method and device, computer equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111597379B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784098A (en) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 Audio searching method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833986A (en) * 2010-05-20 2010-09-15 哈尔滨工业大学 Method for creating three-stage audio index and audio retrieval method
US20140149120A1 (en) * 2012-03-28 2014-05-29 Interactive Intelligence, Inc. System and method for fingerprinting datasets
CN107577773A (en) * 2017-09-08 2018-01-12 科大讯飞股份有限公司 A kind of audio matching method and device, electronic equipment
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN110602303A (en) * 2019-08-30 2019-12-20 厦门快商通科技股份有限公司 Method and system for preventing telecommunication fraud based on audio fingerprint technology
CN111161758A (en) * 2019-12-04 2020-05-15 厦门快商通科技股份有限公司 Song listening and song recognition method and system based on audio fingerprint and audio equipment
CN111400542A (en) * 2020-03-20 2020-07-10 腾讯科技(深圳)有限公司 Audio fingerprint generation method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833986A (en) * 2010-05-20 2010-09-15 哈尔滨工业大学 Method for creating three-stage audio index and audio retrieval method
US20140149120A1 (en) * 2012-03-28 2014-05-29 Interactive Intelligence, Inc. System and method for fingerprinting datasets
CN107577773A (en) * 2017-09-08 2018-01-12 科大讯飞股份有限公司 A kind of audio matching method and device, electronic equipment
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN110602303A (en) * 2019-08-30 2019-12-20 厦门快商通科技股份有限公司 Method and system for preventing telecommunication fraud based on audio fingerprint technology
CN111161758A (en) * 2019-12-04 2020-05-15 厦门快商通科技股份有限公司 Song listening and song recognition method and system based on audio fingerprint and audio equipment
CN111400542A (en) * 2020-03-20 2020-07-10 腾讯科技(深圳)有限公司 Audio fingerprint generation method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784098A (en) * 2021-01-28 2021-05-11 百果园技术(新加坡)有限公司 Audio searching method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111597379B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN110162695B (en) Information pushing method and equipment
JP6734946B2 (en) Method and apparatus for generating information
EP3292481B1 (en) Method, system and computer program product for performing numeric searches
US11100073B2 (en) Method and system for data assignment in a distributed system
CN108256718B (en) Policy service task allocation method and device, computer equipment and storage equipment
CN104572717B (en) Information searching method and device
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN107707660B (en) Cloud storage method and system with selectable algorithm and based on identity verification technology
CN111159413A (en) Log clustering method, device, equipment and storage medium
WO2019148712A1 (en) Phishing website detection method, device, computer equipment and storage medium
US10929464B1 (en) Employing entropy information to facilitate determining similarity between content items
CN111597379B (en) Audio searching method and device, computer equipment and computer-readable storage medium
CN109241360B (en) Matching method and device of combined character strings and electronic equipment
JP2017045291A (en) Similar image searching system
CN111309946A (en) Established file optimization method and device
CN110909266B (en) Deep paging method and device and server
CN111506750B (en) Picture retrieval method and device and electronic equipment
CN110955822B (en) Commodity searching method and device
CN108345699B (en) Method, device and storage medium for acquiring multimedia data
CN104636474A (en) Method and equipment for establishment of audio fingerprint database and method and equipment for retrieval of audio fingerprints
CN110909097B (en) Polygonal electronic fence generation method and device, computer equipment and storage medium
US9236056B1 (en) Variable length local sensitivity hash index
CN111382233A (en) Similar text detection method and device, electronic equipment and storage medium
CN112579839A (en) Multi-mode matching method and device for large-scale features and storage medium
CN105468603A (en) Data selection method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant