WO2022194277A1

WO2022194277A1 - Audio fingerprint processing method and apparatus, and computer device and storage medium

Info

Publication number: WO2022194277A1
Application number: PCT/CN2022/081680
Authority: WO
Inventors: 李敬; 何莹男
Original assignee: 百果园技术(新加坡)有限公司; 李敬
Priority date: 2021-03-18
Filing date: 2022-03-18
Publication date: 2022-09-22
Also published as: CN112784100A

Abstract

Provided are an audio fingerprint processing method and apparatus, and a computer device and a storage medium. The audio fingerprint processing method comprises: generating target fingerprint data for target audio data (101); respectively matching the target fingerprint data with reference fingerprint data in a first audio fingerprint database and reference fingerprint data in a second audio fingerprint database (102); if matching fails, calling a music query service interface to query copyright information of the target audio data; if the copyright information is found, storing the target fingerprint data in the first audio fingerprint database, taking same as new reference fingerprint data in the first audio fingerprint database, and recording the copyright information of the target audio data; and if no copyright information is found, storing the target fingerprint data in the second audio fingerprint database, and taking same as new reference fingerprint data in the second audio fingerprint database.

Description

Audio fingerprint processing method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application with application number 202110292844.8 filed with the China Patent Office on March 18, 2021, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the technical field of audio processing, for example, to an audio fingerprint processing method, apparatus, computer device, and storage medium.

Background technique

With the rapid development of the Internet, especially the widespread popularity of mobile terminals, users can easily create multimedia data, such as making short videos, humming songs, recordings, etc., which makes the amount of multimedia data in the Internet grow rapidly. The data volume of audio data also increases rapidly.

In business scenarios such as song search and voice content review, the audio data is compared to determine whether the audio data is the same or similar.

Due to the large amount of audio data, some music copyright owners record different audio data, record the copyright information of the recorded audio, and provide a Music Query Service Interface (MQSI) to provide an independent music query service.

In scenarios such as short videos, the amount of audio data uploaded by the client to the platform every day can reach tens of millions or even hundreds of millions. The update speed of multimedia data such as short videos is fast, and it is easy to generate new audio data. It has not been recorded by the music copyright owner. If you call the music query service interface to query the audio data, you may not be able to query the relevant information, resulting in low query efficiency. Moreover, the music query service is usually a paid service, and the query volume will lead to high operating costs.

SUMMARY OF THE INVENTION

The embodiment of the present application proposes an audio fingerprint processing method, device, computer equipment and storage medium, which solves the problem that a large amount of multimedia data is updated quickly, resulting in low efficiency and high operating cost of invoking a music query service interface to query audio data. .

Embodiments of the present application provide a method for processing audio fingerprints, including:

generating target fingerprint data for the target audio data;

Matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;

If the target fingerprint data fails to match the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the target audio data copyright information;

In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Referring to the fingerprint data, and recording the copyright information of the target audio data;

In the case where the copyright information of the target audio data is not queried, the target fingerprint data is stored in the second audio fingerprint database, so that the target fingerprint data is used as a new content in the second audio fingerprint database Reference fingerprint data.

The embodiment of the present application also provides an audio fingerprint processing device, including:

A fingerprint data generation module, configured to generate target fingerprint data for the target audio data;

A fingerprint data matching module, configured to match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;

The interface query module is configured to call the music query service interface when the target fingerprint data and the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database all fail to match query the copyright information of the target audio data;

The first update module is configured to store the target fingerprint data in the first audio fingerprint database when the copyright information of the target audio data has been queried to use the target fingerprint data as the first New reference fingerprint data in an audio fingerprint database, and record the copyright information of the target audio data;

A second update module, configured to store the target fingerprint data in the second audio fingerprint database under the condition that the copyright information of the target audio data is not queried, so as to use the target fingerprint data as the first 2. New reference fingerprint data in the audio fingerprint library.

Embodiments of the present application also provide a computer device, the computer device comprising:

one or more processors;

A memory, configured to store one or more programs, which, when executed by the one or more processors, enable the one or more processors to implement the audio fingerprinting described in any embodiment of the present application processing method.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the audio fingerprint processing method described in any embodiment of the present application is implemented .

Description of drawings

1 is a flowchart of an audio fingerprint processing method provided in Embodiment 1 of the present application;

FIG. 2 is a flowchart of an audio fingerprint processing method provided in Embodiment 2 of the present application;

3 is a schematic structural diagram of an apparatus for processing audio fingerprints according to Embodiment 3 of the present application;

FIG. 4 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application.

Detailed ways

The present application will be described below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.

Example 1

1 is a flowchart of an audio fingerprint processing method provided in Embodiment 1 of the application. This embodiment can be applied to hierarchically clustering a fingerprint database, thereby reducing the situation of calling a music query service interface. The fingerprint processing means can be implemented by software and/or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, and the like. The processing method of the audio fingerprint includes the following steps:

Step 101: Generate target fingerprint data for target audio data.

In this embodiment, the computer device can acquire audio data in different ways, for example, receiving audio data uploaded by users, purchasing audio data from copyright owners, recording audio data by technicians, and using crawler clients to crawl audio data from the network ,and many more.

The audio data can be in the form of songs released by singers, audio data separated from video data such as short videos, movies, and TV dramas, and voice signals recorded by the user on the mobile terminal, and so on. The format of the audio data may include moving picture expert compression standard audio layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Windows Media Audio (Windows Media Audio, WMA), Advanced Audio Coding (Advanced Audio Coding, AAC), etc. etc., which are not limited in this embodiment.

As a multimedia platform, computer equipment, on the one hand, can provide users with audio-based services, such as providing users with live programs, short videos, voice conversations, video conversations, etc. Files, such as live data, short videos, session information, etc.

Different multimedia platforms can formulate video content review standards based on business, legal and other factors. Before publishing a file with audio, review the content of the file with audio according to the review standard, and filter out some that do not meet the video content review standards. Audio-carrying files, such as audio-carrying files that contain pornographic, vulgar, violence, etc. content, so as to release some audio-carrying files that meet the video content review standards.

If the real-time requirements for content review are high, a streaming real-time system can be set up in the multimedia platform, and the user uploads the audio-carrying file to the streaming real-time system through the client in real time, and the streaming real-time system can carry the audio file to the real-time streaming system. files to a computer device used for content moderation.

If the real-time requirements for content auditing are low, a database, such as a distributed database, can be set up in the multimedia platform. The user uploads audio files to the database through the client, and the computer equipment set for content auditing can download from the database. Read the file that carries the audio.

In the multimedia platform, fingerprint data is calculated not only for the files that carry audio uploaded by the user, but also for its own audio data. The fingerprint data uses information such as peaks and relative positions in the frequency spectrum of the audio data to represent audio data. The fingerprint data is unique to each audio data, so that audio data search, content audit and other services can be implemented based on audio fingerprints.

For ease of distinction, in this embodiment, the file and audio data carrying audio may be referred to as target audio data, and the fingerprint data generated from the target audio data is referred to as target fingerprint data.

In an embodiment of the present application, step 101 may include the following steps:

Step 1011: Divide the target audio data into multi-frame audio signals.

In this embodiment, the target audio data may be segmented with a preset length at intervals, thereby obtaining multiple frames of audio signals.

Step 1012: Convert the multi-frame audio signal into a spectrogram.

The audio signal contains a large number of frequency components, and multiple frequency components are independent of each other and change continuously along the time axis. The frequency components and frequency components in different audio signals are different. In this embodiment, by analyzing the audio signal The characteristics of the audio signal are obtained from the frequency characteristics of the frequency. In order to analyze the frequency more intuitively, the audio signal in the time domain is usually converted to the frequency domain to obtain a spectrogram, where the horizontal axis (X coordinate) of the spectrogram is time. , the vertical axis (Y coordinate) is the frequency.

In this embodiment, the audio signal may be converted into a frequency spectrum by means of Fourier transform (Discrete Fourier Transform, DFT), short-time Fourier transform (short-time Fourier transform, or, short-term Fourier transform, STFT), etc. picture. The Fourier transform can reflect the average value of the frequency in the audio signal, but it cannot reflect the dynamic characteristics of the frequency changing with time. The short-time Fourier transform overcomes this weakness by adding a window to the audio signal, which can reflect the frequency intensity of the audio signal. , and can reflect the change of frequency intensity with time.

Converting a time-domain signal into a frequency-domain signal will lose time information. Therefore, the short-time Fourier transform can use a data block (also known as a window) to divide a large segment of the audio signal in the time domain into multiple data blocks. Convert multiple data blocks separately to obtain multiple frequency domain signals, which preserves time information to a certain extent.

For example, the parameters of the audio signal are two-channel, 16-bit precision, and 44100Hz sampling. At this time, the data size of 1s is 441002byte and 2 channels≈176kB. If 4kB is selected as the size of the data block, 44 blocks need to be processed every second. The data is subjected to short-time Fourier transform, and such a segmentation density can meet the requirements.

Step 1013 , traverse multiple data points representing peak values on the spectrogram, and use each data point as a peak point.

Audio signals with large amplitudes may span a wide frequency range, from low C (32.70 Hz) to high C (4186.01 Hz). In order to avoid analyzing the entire spectrogram and reduce the amount of computation, the spectrogram can be divided into multiple spectral bands (also called sub-bands).

From each subband, select the data point representing the frequency that belongs to the peak, and use that data point as the peak point. The so-called peak means that there is a sufficient amount of points at which the frequency is rising, and there is a sufficient amount at the point where the frequency is falling. For example, select the following sub-bands: the low sub-band is 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz (bass guitars and other instruments have the fundamental frequency of the bass subband), and the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequency of vocals and most other instruments appears in these two subband).

Since the point with higher energy (ie, the amplitude on the spectrogram) is more resistant to noise, for each subband, the peak point can be selected according to the energy. Usually, the point with the maximum energy can be selected as the peak point in each subband.

Step 1014: Extract characteristic information of each peak point.

In this embodiment, the obtained characteristics can be used as characteristic information by analyzing the characteristics of each peak point itself and the characteristics between the peak points.

In one example, the frequency value of each peak point may be queried, and the frequency value may be used as characteristic information of each peak point.

In another example, by traversing each peak point, a first distance in time between each peak point and each of the other peak points may be measured, and the first distance may be used as the characteristic information of each peak point.

In an example, since the abscissa corresponding to the peak point in the spectrogram is time, the time interval between each peak point and each of the other peak points can be counted, and each peak point can be compared with other peak points. The time interval of each peak-to-peak point in the above is taken as the first distance of each peak point.

For one peak point, other peak points are peak points on the spectrogram except the one peak point.

The closer a peak point is to other peak points in time, the higher the correlation between the one peak point and other peak points is. Therefore, for each peak point, find the time dimension, on the spectrogram, which is located in each peak point. other peak points in the neighborhood of the peak points, calculate the first distance in time between each peak point and each of the other peak points found, and use the first distance as the each peak point characteristic information.

In addition, other peak points outside the neighborhood of each current peak point can be ignored, and the amount of calculation is reduced while maintaining the accuracy of the feature information.

In yet another example, a second distance in frequency between each peak point and each of the other peak points may be measured, and the second distance may be used as characteristic information of each peak point.

The closer a peak point is to other peak points in frequency, the higher the correlation between the one peak point and other peak points. Therefore, for each peak point, look for the frequency dimension, on the spectrogram, which is located in each peak point. other peak points in the neighborhood of the peak points, calculate the second distance in frequency between each peak point and each of the other peak points found, and use the second distance as the characteristic information of each peak point .

In this embodiment, the frequency value, the first distance, and the second distance can be used alone as the characteristic information of the peak point, or can be arbitrarily combined as the characteristic information of the peak point, which is not limited in this embodiment. When the frequency value, the first distance, and the second distance are simultaneously used as the characteristic information of the peak point, the characteristics of the peak point can be reflected from multiple modes, thereby improving the accuracy of the characteristic information of the peak point.

The above characteristic information of the peak point is only an example. When implementing the embodiment of the present application, other characteristic information of the peak point may be set according to the actual situation, which is not limited in the embodiment of the present application. In addition, in addition to the characteristic information of the peak point, those skilled in the art can also use other characteristic information of the peak point according to actual needs, which is not limited in this embodiment of the present application.

Step 1015: Calculate a hash value for the characteristic information of each peak point, and use the hash value corresponding to each peak point as a target fingerprint data of the target audio data.

For the characteristic information of each peak point, a hash value (hash, also known as hash value) can be calculated for it according to a preset hash algorithm, and the hash value corresponding to each peak point can be used as a target of the target audio data Fingerprint data to identify target audio data.

In an example, the characteristic information of one peak point is the frequency value of the one peak point itself, the first distance in time between the one peak point and other peak points, and the frequency between the one peak point and other peak points on the second distance. In this example, the frequency value, the first distance, and the second distance of each peak point can be converted into binary format. When the conversion is completed, according to the preset arrangement rules, such as the frequency value first, the first distance In the middle, the second distance is behind, the frequency value is in the back, the first distance is in the middle, the second distance is in the front, etc., the frequency value, the first distance and the second distance of each peak point in binary format are spliced , and use the splicing result as a target fingerprint data of the target audio data. The fingerprint data in binary format is more intuitive, and it is convenient to convert the fingerprint data into the original frequency value, the first distance and the second distance, so as to facilitate the debugging of development and reduce the cost of development.

The above method for calculating the hash value is only an example. In implementing the embodiments of the present application, other methods for calculating the hash value may be set according to actual conditions. Algorithms such as a hash algorithm (Secure Hash Algorithm, SHA) calculate a hash value for the frequency value, the first distance, and the second distance, which is not limited in this embodiment of the present application. In addition, in addition to the characteristic information of the peak point, those skilled in the art can also use other characteristic information of the peak point according to actual needs, which is not limited in this embodiment of the present application.

Step 102: Match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database.

In this embodiment, two independent databases may be constructed as a first audio fingerprint database and a second audio fingerprint database, wherein the first audio fingerprint database is used to store audio data with copyright information queried through the music query service interface The reference fingerprint data of the second audio fingerprint database is used to store the reference fingerprint data for querying the audio data without copyright information through the music query service interface.

Initially, the first audio fingerprint database and the second audio fingerprint database can be empty, or the reference fingerprint data of the audio data that has been verified to have copyright information in a batch of audio data can be stored by manual local verification, verification by other institutions, etc. To the first audio fingerprint database, the reference fingerprint data of the audio data verified to have no copyright information is stored in the second audio fingerprint database as a seed, which is not limited in this embodiment.

The reference fingerprint data also belongs to the fingerprint data of the audio data, and the method of generating the reference fingerprint data is the same as that of generating the target fingerprint data.

In the case where the target fingerprint data of the target audio data has been generated, the target fingerprint data can be matched with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, so as to determine whether the target fingerprint data matches the reference fingerprint data in the second audio fingerprint database. The reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database are the same or similar.

Exemplarily, the first audio fingerprint database includes multiple reference fingerprint data, and the second audio database includes multiple reference fingerprint data.

Considering that more audio data has copyright information, and less audio data is original and does not have copyright information, the priority of matching the reference fingerprint data in the first audio fingerprint database can be higher than matching the reference fingerprint data in the second audio fingerprint database. The priority of the reference fingerprint data, that is, matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database, if the target fingerprint data fails to match with all the reference fingerprint data in the first audio fingerprint database, then The fingerprint data is matched with the reference fingerprint data in the second audio fingerprint database. If the target fingerprint data is successfully matched with any reference fingerprint data in the first audio fingerprint database, then stop the target fingerprint data and the second audio fingerprint database. Matching with reference to the fingerprint data, in the case that more audio data has copyright information, and less audio data is original and does not have copyright information, the probability of successful matching with the reference fingerprint data in the first audio fingerprint database is high. , the probability of successful matching with the reference fingerprint data in the second audio fingerprint database is low. Therefore, matching the reference fingerprint data in the first audio fingerprint database first can reduce the calculation of subsequent matching of the reference fingerprint data in the second audio fingerprint database. quantity, thereby improving the matching efficiency.

In addition to the priority of matching reference fingerprint data in the first audio fingerprint database may be higher than the priority of matching reference fingerprint data in the second audio fingerprint database, the priority of matching reference fingerprint data in the first audio fingerprint database is also Can be lower than the priority of matching the reference fingerprint data in the second audio fingerprint database, that is, the target fingerprint data is matched with the reference fingerprint data in the second audio fingerprint database, if the target fingerprint data and the second audio fingerprint database are matched. All the reference fingerprint data fails to match, then the target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database, if the target fingerprint data and any reference fingerprint data in the second audio fingerprint database are matched successfully, then stop The target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database, which is not limited in this embodiment.

In the specific implementation, the target audio data may be long audio, so when the target audio data is divided into multi-frame audio signals to calculate the target fingerprint data, the target audio data may have multiple target fingerprint data, and for multimedia data such as short videos, many The multiplexed part has audio data with copyright information, such as the climax part of a song. Therefore, the similarity between each target fingerprint data and each reference fingerprint data in the first audio library can be calculated, and the similarity between each target fingerprint data and the first audio data can be calculated. The similarity of each reference fingerprint data in the two audio fingerprint database. If the similarity between consecutive n (n is a positive integer) target fingerprint data in all target fingerprint data and consecutive n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, it can be determined that The consecutive n target fingerprint data of the target audio data are successfully matched with the consecutive n reference fingerprint data in the one audio fingerprint database, and then it is determined that the target fingerprint data of the target audio data is successfully matched with the reference fingerprint data in the one audio fingerprint database. By comparing the similarity and relative position, the stability between the target fingerprint data and the reference fingerprint data can be ensured, thereby ensuring the accuracy of the target fingerprint data and the reference fingerprint data.

Step 103: If the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data.

If the target fingerprint data fails to match with all the reference fingerprint data in the first audio fingerprint database and all the reference fingerprint data in the second audio fingerprint database, it means that the computer device has not searched for the same or similar target audio data locally. Audio data, the target audio data is more likely to be new audio data, in this case, the music query service interface can be called, and the target audio data can be sent to the server of the music copyright owner according to the specifications of the music query service interface. The server of the music copyright party queries whether the target audio data has copyright information.

Step 104, if the copyright information of the target audio data has been queried, then store the target fingerprint data in the first audio fingerprint database to use the target fingerprint data as the new reference fingerprint data in the first audio fingerprint database, and record the target audio data. copyright information.

If the server of the music copyright owner returns the copyright information of the target audio data through the music query service interface, the target fingerprint data can be stored in the first audio fingerprint database, and the target fingerprint data is a new reference fingerprint in the first audio fingerprint database data. In addition, the copyright information of the target audio data is recorded in the form of other tables or databases, and the copyright information can be indexed with the identification of the target audio data (such as identification (Iden, ID)), which is the same as the new one in the first audio fingerprint database. See Fingerprint Data Association.

In a storage method, each target fingerprint data is used as a key key, and the identification (eg ID) of the target audio data and the serial number of the audio signal to which each target fingerprint data belongs are the value value, and a key-value pair (key-value pair) is generated. , value), the audio signal to which each target fingerprint data belongs belongs to a frame of signal in the target audio data.

A key-value pair (key, value) is stored in the first audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the first audio fingerprint database.

For each index value index, b (b is a positive integer, such as 2 ^N ) storage locations can be provided to store target fingerprint data with the same key but different values, so that in the first audio fingerprint database A data table with row a (a is the length of the key, that is, the length of the target fingerprint data, which is a positive integer) and column b is formed to improve storage efficiency and search simplicity.

The above-mentioned method of storing the target fingerprint data in the first audio fingerprint database is only an example. When implementing the embodiments of the present application, other methods of storing the target fingerprint data in the first audio fingerprint database may be set according to the actual situation. The identifier is the key key, and all the target fingerprint data of the target audio data are the value value, generate a key-value pair (key, value), store the key-value pair (key, value) in the first audio fingerprint library, etc., This embodiment of the present application does not limit this. In addition, in addition to the above-mentioned method of storing the target fingerprint data in the first audio fingerprint database, those skilled in the art can also adopt other methods of storing the target fingerprint data in the first audio fingerprint database according to actual needs. be restricted.

Step 105: If the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.

If the server of the music copyright owner returns the result that the target audio data does not have copyright information through the music query service interface, the target fingerprint data can be stored in the second audio fingerprint database, and the target fingerprint data is stored in the second audio fingerprint database. New reference fingerprint data.

A key-value pair (key, value) is stored in the second audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the second audio fingerprint database.

For each index value index, b (b is a positive integer, such as 2 ^N ) storage locations can be provided, so that target fingerprints with the same key but different values can be stored, thereby forming the second audio fingerprint database. A data table with row a (a is the length of the key, that is, the length of the target fingerprint data, which belongs to a positive integer) row and column b, in order to improve the efficiency of storage and the simplicity of searching.

The above method of storing the target fingerprint data to the second audio fingerprint database is only an example. When implementing the embodiments of the present application, other methods of storing the target fingerprint data to the second audio fingerprint database may be set according to actual conditions. The identifier of the target audio data is the key key, all the target fingerprint data of the target audio data are the value value, generate a key-value pair (key, value), store the key-value pair (key, value) in the second audio fingerprint database, etc. The application examples do not limit this. In addition, in addition to the above method of storing the target fingerprint data in the second audio fingerprint database, those skilled in the art can also adopt other methods of storing the target fingerprint data in the second audio fingerprint database according to actual needs. be restricted.

It should be noted that the method of storing the target fingerprint data in the first audio fingerprint database and the method of storing the target fingerprint data in the second audio fingerprint database may be the same or different, which is not limited in this embodiment.

In this embodiment, target fingerprint data is generated for the target audio data; the target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database; In the case where the fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data; In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Refer to the fingerprint data, and record the copyright information of the target audio data; if the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to The target fingerprint data is used as the new reference fingerprint data in the second audio fingerprint database. Using the music query service interface as the basis for grading, divide the first audio fingerprint database and the second audio fingerprint database to distinguish whether there is a version of audio data, record new audio data, improve the success rate of search, and use the first audio fingerprint The library, the second audio fingerprint library, and the music query service interface formulate a joint hierarchical query mechanism, that is, first search the first audio fingerprint library, the second audio fingerprint library, and then call the music query service interface, which can effectively use the first audio fingerprint library, the second audio fingerprint library The fingerprint data in the second audio fingerprint database reduces the number of calls of the music query service interface, thereby reducing operating costs.

Embodiment 2

2 is a flowchart of an audio fingerprint processing method provided in Embodiment 2 of the present application. Based on the foregoing embodiments, this embodiment adds clustering of target audio data, use of time-to-live to manage reference fingerprint data, and reference fingerprint data. The operation of data transfer database, the method includes the following steps:

Step 201: Generate target fingerprint data for target audio data.

Step 202: Match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database.

Step 203: If the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data.

Step 204, if the copyright information of the target audio data has been queried, then store the target fingerprint data in the first audio fingerprint database to use the target fingerprint data as the new reference fingerprint data in the first audio fingerprint database, and record the target audio data. copyright information.

Step 205: Use the target audio data as new reference audio data, and generate a new cluster for the new reference audio data.

In this embodiment, if the copyright information of the target audio data is queried through the music query service interface, it means that the computer equipment does not store the same or similar audio data locally as the target audio data. In addition to the information, the target audio data may also be set as new reference audio data, and a new cluster for clustering the same or similar audio data may be generated for the new reference audio data.

Step 206: If the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.

Step 207: If the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, add the target audio data to the cluster to which the reference audio data belongs.

In this embodiment, each audio fingerprint database has a plurality of reference fingerprint data of audio data. If the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, it means that audio data that is the same as or similar to the target audio data has been stored locally in the computer device. For the convenience of distinction, the audio data may be referred to as reference audio data . When the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, the cluster to which the reference audio data belongs can be searched, and the target audio data can be added to the cluster to which the reference audio data belongs, so that the same or similar The audio data of the data are clustered into the same cluster, which is convenient for subsequent business processing such as user classification and song recommendation based on the cluster.

Step 208: If the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, add the target audio data to the cluster to which the reference audio data belongs.

In this embodiment, if the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, it means that audio data that is the same or similar to the target audio data has been stored locally in the computer device. It is called reference audio data. When the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the cluster to which the reference audio data belongs can be searched, and the target audio data can be added to the cluster to which the reference audio data belongs, so that the same or similar The audio data of the data is clustered into the same cluster, which is convenient for subsequent business processing such as user classification and song recommendation based on the cluster.

Exemplarily, if the similarity between consecutive n (n is a positive integer) target fingerprint data in all target fingerprint data of the target audio data and consecutive n reference fingerprint data in an audio fingerprint database is greater than If the preset threshold is set, it is determined that the consecutive n pieces of target fingerprint data of the target audio data are successfully matched with the consecutive n pieces of reference fingerprint data. For example, n is 3. If the similarity between the first target fingerprint data in the three consecutive target fingerprint data of the target audio data and the first reference fingerprint data in the three consecutive reference fingerprint data in the first audio fingerprint database is greater than the predetermined similarity Set the threshold, the similarity between the second target fingerprint data in the three consecutive target fingerprint data and the second reference fingerprint data in the three consecutive reference fingerprint data is greater than the preset threshold, and the three consecutive The similarity between the third target fingerprint data in the target fingerprint data and the third reference fingerprint data in the three consecutive reference fingerprint data is greater than the preset threshold, then it is determined that the three consecutive target fingerprint data of the target audio data are the same as The three consecutive reference fingerprint data in the first audio fingerprint database are successfully matched.

Exemplarily, the audio data to which consecutive n pieces of reference fingerprint data successfully matched with the target audio data belong to the reference audio data are used as the reference audio data, and the target audio data is added to the cluster to which the reference audio data belongs.

Step 209: If the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, count the indicators of the successful matching of the reference fingerprint data.

Step 210: If the index satisfies the preset transfer conditions, transfer the reference fingerprint data from the second audio fingerprint database to the first audio fingerprint database.

Considering the situation that new audio data is easily generated in scenarios such as new songs released on the Internet and short video updates are fast, but not included by the music copyright party, the reference fingerprint data in the second audio fingerprint database can be set in advance. When the transfer condition is met, the reference fingerprint data can be transferred to the database.

In this embodiment, if the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the reference fingerprint data can be counted as indicators of successful matching, for example, the total number of successful matching, the frequency of successful matching, and many more.

Exemplarily, if the consecutive n pieces of reference fingerprint data in the second audio database are successfully matched with the target fingerprint data of the target audio data, an indicator of successful matching is counted for each of the consecutive n pieces of reference fingerprint data. For example, if n is 3, if three consecutive reference fingerprint data in the second audio audio database are successfully matched with the target fingerprint data of the target audio data, then add 1 to the total number of successful matching of the three consecutive reference fingerprint data.

Compare this indicator with the transfer conditions at the same latitude, for example, the total number of successful matches is greater than or equal to the first threshold, the frequency of successful matches is greater than or equal to the second threshold, and so on.

If the indicator satisfies the preset library transfer conditions, it means that the reference fingerprint data belongs to relatively popular audio data, possibly a newly released song, etc., the reference fingerprint data can be transferred from the second audio fingerprint database to the first audio fingerprint database, and generate prompt information, the prompt information is used to prompt the operator to add copyright information to the audio data to which the reference fingerprint data belongs.

If the index does not meet the preset library transfer condition, the reference fingerprint data can be kept and stored in the second audio fingerprint library.

Step 211: Set the time-to-live for the reference fingerprint data in the first audio fingerprint database and/or the second audio fingerprint database.

In scenarios such as short videos, the changing speed of some audio data is fast. After a period of popularity, the audio data is rarely used by users. For similar scenarios, for the reference fingerprint data in the first audio fingerprint database, you can set The specified first value is used as the lifetime of the reference fingerprint data. For the reference fingerprint data in the second audio fingerprint database, the specified second value can also be set as the lifetime of the reference fingerprint data.

Considering that more audio data have copyright information, and less audio data are original and do not have copyright information, the target fingerprint data and the reference fingerprint data in the first audio fingerprint database have a higher chance of successfully matching, and the second audio data has a higher probability of being successfully matched. The probability of successful matching of the reference fingerprint data in the fingerprint database is low, and the survival time of the reference fingerprint data in the first audio fingerprint database can be set to be greater than the survival time of the reference fingerprint data in the second audio fingerprint database, that is, the first value is greater than the second value. , thereby maintaining the probability of successful matching of the reference fingerprint data in the first audio fingerprint database, reducing the calling frequency of the music query service interface, and reducing operating costs.

Except that the lifetime of the reference fingerprint data in the first audio fingerprint database is greater than the lifetime of the reference fingerprint data in the second audio fingerprint database, the lifetime of the reference fingerprint data in the first audio fingerprint database can also be set to be equal to or less than that of the second audio fingerprint database. The lifetime of the reference fingerprint data in the fingerprint database, that is, the first value is equal to or smaller than the second value, which is not limited in this embodiment.

Step 212: Attenuate the survival time.

For the lifetime of the reference fingerprint data, a timer can be started to count down, so as to attenuate the lifetime, that is, to continuously decrease the value of the lifetime.

Under normal circumstances, the attenuation can be carried out according to the normal time flow rate, and the attenuation is not variable speed.

Step 213: If the reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database is successfully matched with the target fingerprint data, increase the survival time.

If the reference fingerprint data in the first audio fingerprint database is successfully matched with the target fingerprint data, the survival time of the reference fingerprint data can be increased. For example, the survival time is restored to the original first value. Increase the first step length on the basis, and so on.

If the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, the survival time of the reference fingerprint data can be increased, for example, the survival time is restored to the original second value. Increase the second step size on the basis, and so on.

Step 214: Delete the reference fingerprint data from the first audio fingerprint database or the second audio fingerprint database if the time-to-live decay is completed.

If the reference fingerprint data in the first audio fingerprint database has been attenuated, that is, the current value is 0, it means that the frequency of use of the audio data to which the reference fingerprint data belongs is relatively low. In this case, the reference fingerprint data can be deleted from the first audio fingerprint database. With reference to the fingerprint data, while maintaining the matching success rate of the reference fingerprint data in the first audio fingerprint database, reduce the data volume of the reference fingerprint data stored in the first audio fingerprint database, release the space of the first audio fingerprint database, thereby Effectively meet the storage requirements of processing continuous fingerprint data under the condition of limited storage capacity.

If the reference fingerprint data in the second audio fingerprint database has been attenuated, that is, the current value is 0, it means that the frequency of use of the audio data to which the reference fingerprint data belongs is relatively low. At this time, the reference fingerprint data can be deleted from the second audio fingerprint database. , under the situation of keeping the matching success rate of the reference fingerprint data in the second audio fingerprint database, reduce the data volume of the reference fingerprint data stored in the second audio fingerprint database, release the space of the second audio fingerprint database, thereby effectively satisfying Under the condition of limited storage capacity, it handles the storage requirements of continuous fingerprint data.

Exemplarily, a time-to-live is set for each reference fingerprint data in each audio fingerprint database and the time to live is attenuated, and a reference fingerprint data in an audio fingerprint database is determined to match the target fingerprint data of the target audio data. Under the situation, the generation time of this one reference fingerprint data in this one audio fingerprint database is increased, for example, if three consecutive reference fingerprint data in the second audio database are matched with three consecutive target fingerprint data of target audio data, then The generation time of each of the three consecutive reference fingerprint data is increased. In the case that the generation time of one reference fingerprint data in one audio fingerprint database has been decayed, the one reference fingerprint data is deleted from the one audio fingerprint database.

Embodiment 3

3 is a structural block diagram of an apparatus for processing audio fingerprints provided in Embodiment 3 of the present application, which may include the following modules:

The fingerprint data generation module 301 is configured to generate target fingerprint data for the target audio data; the fingerprint data matching module 302 is configured to compare the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database. The reference fingerprint data is matched; the interface query module 303 is set to if the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, then Invoke the music query service interface to query the copyright information of the target audio data; the first update module 304 is configured to store the target fingerprint data in the first audio if the copyright information of the target audio data has been queried In the fingerprint database, the target fingerprint data is used as the new reference fingerprint data in the first audio fingerprint database, and the copyright information of the target audio data is recorded; the second update module 305 is set to if the copyright information of the target audio data, the target fingerprint data is stored in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.

In an embodiment of the present application, the fingerprint data generation module 301 includes:

an audio signal division module, configured to divide the target audio data into multi-frame audio signals; a spectrogram conversion module, configured to convert the multi-frame audio signals into a spectrogram; a peak point search module, set to be on the spectrogram Traverse multiple data points representing peaks, and take each data point as a peak point; the feature information extraction module is set to extract the feature information of each peak point; the hash value calculation module is set to extract the feature information of each peak point The hash value is calculated from the characteristic information of each peak point, and the hash value corresponding to each peak point is used as a target fingerprint data of the target audio data.

In an embodiment of the present application, the feature information extraction module includes:

The frequency value query module is set to query the frequency value of each peak point, and the frequency value is used as the characteristic information of each peak point; the time distance measurement module is set to measure each peak point and every other peak point. The first distance of each peak point in time, the first distance is used as the characteristic information of each peak point; the frequency distance measurement module is set to measure the distance between each peak point and each peak point in other peak points. The second distance in frequency, and the second distance is used as the characteristic information of each peak point.

In an embodiment of the present application, the time distance measurement module includes:

The time neighborhood search module is set to search for other peak points located in the neighborhood of each peak point on the spectrogram under the dimension of time; the time distance calculation module is set to calculate each peak point and find the The first distance in time of each peak point in the other peak points, and the first distance is used as the characteristic information of each peak point.

In an embodiment of the present application, the frequency distance measurement module includes:

The frequency neighborhood search module is set to search for other peak points located in the neighborhood of each peak point on the spectrogram under the dimension of frequency; the frequency distance calculation module is set to calculate each peak point and find The second distance in frequency of each peak point in the other peak points, and the second distance is used as the characteristic information of each peak point.

In an embodiment of the present application, the hash value calculation module includes:

The binary conversion module is configured to convert the frequency value, the first distance and the second distance of each peak point into binary format; the splicing module is configured to convert the frequency value of each peak point into binary format if the conversion is completed. The frequency value, the first distance and the second distance are spliced, and the splicing result is used as a target fingerprint data of the target audio data.

In an embodiment of the present application, the fingerprint data matching module 302 includes:

Similarity calculation module, is set to calculate the similarity of each target fingerprint data and all the reference fingerprint data in the first audio fingerprint database and calculate the similarity of each target fingerprint data and the reference fingerprint data in the second audio fingerprint database; The continuous matching module is set to if the similarity between the continuous n target fingerprint data in all target fingerprint data and the continuous n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, then determine the The target fingerprint data of the target audio data is successfully matched with the reference fingerprint data in the one audio fingerprint database, and the one audio fingerprint database includes the first audio fingerprint database or the second audio fingerprint database, and n is a positive integer.

In an embodiment of the present application, the first update module 304 includes:

The first key-value pair generation module is set to take each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and each The audio signals to which the target fingerprint data belongs belong to a frame of signals in the target audio data; the first key-value pair storage module is configured to store the key-value pair in the first audio fingerprint database, and store the key-value pair in the first audio fingerprint database. The key-value pair is used as the new reference fingerprint data in the first audio fingerprint database.

In an embodiment of the present application, the second update module 305 includes:

The second key-value pair generation module is configured to use each target fingerprint data as a key, the identifier of the target audio data, and the serial number of the audio signal to which each target fingerprint data belongs as a value, to generate a key-value pair, each The audio signals to which the target fingerprint data belongs belong to a frame of signals in the target audio data; the second key-value pair storage module is configured to store the key-value pairs in the second audio fingerprint database, and store the key-value pair in the second audio fingerprint database. The key-value pair is used as the new reference fingerprint data in the second audio fingerprint database.

In an embodiment of the present application, it also includes:

The cluster generation module is configured to use the target audio data as new reference audio data, and generate a new cluster for the new reference audio data.

In an embodiment of the present application, it also includes:

The first cluster adding module is set to if the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, then the target audio data is added to the cluster to which the reference audio data belongs, and the first audio data is added to the cluster to which the reference audio data belongs. The reference fingerprint data in an audio fingerprint database belongs to the reference audio data; a second cluster adding module is configured to add the target fingerprint data to the reference fingerprint data in the second audio fingerprint database if the target fingerprint data matches successfully with the reference fingerprint data in the second audio fingerprint database. The target audio data is added to the cluster to which the reference audio data belongs, and the reference fingerprint data in the second audio fingerprint database belongs to the reference audio data.

In an embodiment of the present application, it also includes:

A survival time setting module, configured to set a survival time for the reference fingerprint data in the first audio fingerprint database and/or the second audio fingerprint database; a survival time decay module, configured to attenuate the survival time; survival time A time increase module, configured to increase the survival time if the reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database is successfully matched with the target fingerprint data; a fingerprint data deletion module, set to If the decay of the time-to-live is completed, the reference fingerprint data is deleted from the first audio fingerprint database or the second audio fingerprint database.

In an embodiment of the present application, it also includes:

The index statistics module is set to, if the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, then the reference fingerprint data is statistically matched to the index of the successful matching; the fingerprint data database moving module is set to if If the index satisfies the preset database transfer condition, the reference fingerprint data is transferred from the second audio fingerprint database to the first audio fingerprint database.

The audio fingerprint processing apparatus provided by the embodiment of the present application can execute the audio fingerprint processing method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.

Embodiment 4

FIG. 4 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application. FIG. 4 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present application. The computer device 12 shown in FIG. 4 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 4, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, MicroChannel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) , VESA) local bus and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.

System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be configured to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in Figure 4, a magnetic disk drive configured to read and write to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg Compact Disc Read-Only Memory) may be provided Read-Only Memory, CD-ROM), (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive for reading and writing. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. The memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

A program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through an Input/Output (I/O) interface 22 . Also, computer device 12 may communicate with one or more networks (eg, Local Area Network (LAN), Wide Area Network (WAN), and/or public networks such as the Internet) through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.

The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, implementing the audio fingerprint processing method provided by the embodiments of the present application.

Embodiment 5

The fifth embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above-mentioned audio fingerprint processing method can be achieved, and the same can be achieved. The technical effect, in order to avoid repetition, will not be repeated here.

Wherein, the computer-readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (Read Only Memory) , ROM), Electrically Erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or the above any suitable combination. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

Claims

A method for processing audio fingerprints, comprising:

generating target fingerprint data for the target audio data;

Matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;

If the target fingerprint data fails to match the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the target audio data copyright information;

In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Referring to the fingerprint data, and recording the copyright information of the target audio data;

In the case where the copyright information of the target audio data is not queried, the target fingerprint data is stored in the second audio fingerprint database, so that the target fingerprint data is used as a new content in the second audio fingerprint database Reference fingerprint data.
The method according to claim 1, wherein the generating target fingerprint data for the target audio data comprises:

dividing the target audio data into multiple frames of audio signals;

converting the multi-frame audio signal into a spectrogram;

Traverse a plurality of data points representing a peak value on the spectrogram, and use each data point as a peak point;

Extract the characteristic information of each peak point;

A hash value is calculated for the characteristic information of each peak point, and the hash value corresponding to each peak point is used as a target fingerprint data of the target audio data.
The method according to claim 2, wherein the extracting characteristic information of each peak point comprises:

query the frequency value of each peak point, and use the frequency value as the characteristic information of each peak point;

Measure the first distance in time between each peak point and each of the other peak points, and use the first distance as the characteristic information of each peak point;

A second distance in frequency between each peak point and each of the other peak points is measured, and the second distance is used as characteristic information of each peak point.
The method according to claim 3, wherein the first distance in time between each peak point and each of the other peak points is measured, and the first distance is used as a feature of each peak point information, including:

Find other peak points in the neighborhood of each peak point on the spectrogram under the dimension of time;

Calculate the first distance in time between each peak point and each of the other peak points found, and use the first distance as the characteristic information of each peak point;

The measurement of the second distance in frequency between each peak point and each of the other peak points, and the second distance as the characteristic information of each peak point, including:

Find other peak points in the neighborhood of each peak point on the spectrogram under the dimension of frequency;

A second distance in frequency between each peak point and each of the other peak points found is calculated, and the second distance is used as characteristic information of each peak point.
The method according to claim 3, wherein, calculating a hash value for the characteristic information of each peak point, and using the hash value corresponding to each peak point as a target fingerprint data of the target audio data, comprising:

Converting the frequency value, the first distance and the second distance of each peak point into binary format;

When the conversion is completed, the frequency value, the first distance and the second distance of each peak point are spliced, and the splicing result is used as a target fingerprint data of the target audio data.
The method according to claim 2, wherein the first audio fingerprint database includes a plurality of reference fingerprint data, and the second audio fingerprint database includes a plurality of reference fingerprint data;

The matching of the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database includes:

Calculate the similarity between each of the target fingerprint data and all the reference fingerprint data in the first audio fingerprint database and calculate the similarity between each of the target fingerprint data and all the reference fingerprint data in the second audio fingerprint database;

In the case that the similarity between the consecutive n target fingerprint data in all the target fingerprint data and the consecutive n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, determine the similarity of the target audio data. The target fingerprint data is successfully matched with the reference fingerprint data in the one audio fingerprint database, where the one audio fingerprint database includes the first audio fingerprint database or the second audio fingerprint database, and n is a positive integer.
The method of claim 2, wherein the storing the target fingerprint data into the first audio fingerprint database is to use the target fingerprint data as new reference fingerprint data in the first audio fingerprint database ,include:

Taking each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and the audio signal to which each target fingerprint data belongs belongs to the A frame of signal in the target audio data;

The key-value pair is stored in the first audio fingerprint library, and the key-value pair is used as new reference fingerprint data in the first audio fingerprint library;

The storing of the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database includes:

Taking each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and the audio signal to which each target fingerprint data belongs belongs to the A frame of signal in the target audio data;

The key-value pair is stored in the second audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the second audio fingerprint database.
The method according to any one of claims 1-7, in the storing of the target fingerprint data in the first audio fingerprint database, to use the target fingerprint data as a new one in the first audio fingerprint database After the reference fingerprint data, it also includes:

Taking the target audio data as new reference audio data, a new cluster is generated for the new reference audio data.
The method according to any one of claims 1-7, further comprising:

In the case that the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, the target audio data is added to the cluster to which the reference audio data belongs. the reference fingerprint data belongs to the reference audio data;

In the case where the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the target audio data is added to the cluster to which the reference audio data belongs, and the target audio data is added to the cluster to which the reference audio data belongs. Reference fingerprint data belongs to the reference audio data.
The method according to any one of claims 1-7, further comprising at least one of the following:

Setting a survival time for the reference fingerprint data in the first audio fingerprint database; attenuating the survival time; in the case that the reference fingerprint data in the first audio fingerprint database is successfully matched with the target fingerprint data, then increase the time-to-live; when the decay of the time-to-live is completed, delete the reference fingerprint data from the first audio fingerprint database;

Setting a survival time for the reference fingerprint data in the second audio fingerprint database; attenuating the survival time; in the case that the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, Increase the time-to-live; and delete the reference fingerprint data from the second audio fingerprint database when the decay of the time-to-live is completed.
The method according to any one of claims 1-7, further comprising:

In the case that the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, count the indicators of successful matching of the reference fingerprint data;

In the case that the index satisfies the preset database transfer condition, the reference fingerprint data is transferred from the second audio fingerprint database to the first audio fingerprint database.
An audio fingerprint processing device, comprising:

A fingerprint data generation module, configured to generate target fingerprint data for the target audio data;

A fingerprint data matching module, configured to match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;

The interface query module is configured to call the music query service interface when the target fingerprint data and the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database all fail to match query the copyright information of the target audio data;

The first update module is configured to store the target fingerprint data in the first audio fingerprint database when the copyright information of the target audio data has been queried to use the target fingerprint data as the first New reference fingerprint data in an audio fingerprint database, and record the copyright information of the target audio data;

The second update module is configured to store the target fingerprint data in the second audio fingerprint database under the condition that the copyright information of the target audio data is not queried, so as to use the target fingerprint data as the first 2. New reference fingerprint data in the audio fingerprint library.
A computer device comprising:

at least one processor;

A memory configured to store one or more programs that, when executed by the at least one processor, cause the at least one processor to implement the audio fingerprinting of any one of claims 1-11 Approach.
A computer-readable storage medium, storing a computer program on the computer-readable storage medium, the computer program implementing the audio fingerprint processing method according to any one of claims 1-11 when the computer program is executed by a processor.