WO2022194277A1 - Audio fingerprint processing method and apparatus, and computer device and storage medium - Google Patents

Audio fingerprint processing method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2022194277A1
WO2022194277A1 PCT/CN2022/081680 CN2022081680W WO2022194277A1 WO 2022194277 A1 WO2022194277 A1 WO 2022194277A1 CN 2022081680 W CN2022081680 W CN 2022081680W WO 2022194277 A1 WO2022194277 A1 WO 2022194277A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
data
target
fingerprint
fingerprint data
Prior art date
Application number
PCT/CN2022/081680
Other languages
French (fr)
Chinese (zh)
Inventor
李敬
何莹男
Original Assignee
百果园技术(新加坡)有限公司
李敬
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 李敬 filed Critical 百果园技术(新加坡)有限公司
Publication of WO2022194277A1 publication Critical patent/WO2022194277A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings

Definitions

  • the audio signal contains a large number of frequency components, and multiple frequency components are independent of each other and change continuously along the time axis.
  • the frequency components and frequency components in different audio signals are different.
  • by analyzing the audio signal The characteristics of the audio signal are obtained from the frequency characteristics of the frequency.
  • the audio signal in the time domain is usually converted to the frequency domain to obtain a spectrogram, where the horizontal axis (X coordinate) of the spectrogram is time. , the vertical axis (Y coordinate) is the frequency.
  • a first distance in time between each peak point and each of the other peak points may be measured, and the first distance may be used as the characteristic information of each peak point.
  • a second distance in frequency between each peak point and each of the other peak points may be measured, and the second distance may be used as characteristic information of each peak point.
  • Step 102 Match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database.
  • Embodiment 2 is a flowchart of an audio fingerprint processing method provided in Embodiment 2 of the present application. Based on the foregoing embodiments, this embodiment adds clustering of target audio data, use of time-to-live to manage reference fingerprint data, and reference fingerprint data.
  • the operation of data transfer database, the method includes the following steps:
  • the indicator satisfies the preset library transfer conditions, it means that the reference fingerprint data belongs to relatively popular audio data, possibly a newly released song, etc.
  • the reference fingerprint data can be transferred from the second audio fingerprint database to the first audio fingerprint database, and generate prompt information, the prompt information is used to prompt the operator to add copyright information to the audio data to which the reference fingerprint data belongs.
  • the lifetime of the reference fingerprint data in the first audio fingerprint database can also be set to be equal to or less than that of the second audio fingerprint database.
  • the lifetime of the reference fingerprint data in the fingerprint database that is, the first value is equal to or smaller than the second value, which is not limited in this embodiment.
  • the reference fingerprint data in the first audio fingerprint database has been attenuated, that is, the current value is 0, it means that the frequency of use of the audio data to which the reference fingerprint data belongs is relatively low.
  • the reference fingerprint data can be deleted from the first audio fingerprint database.
  • reduce the data volume of the reference fingerprint data stored in the first audio fingerprint database release the space of the first audio fingerprint database, thereby Effectively meet the storage requirements of processing continuous fingerprint data under the condition of limited storage capacity.
  • the index statistics module is set to, if the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, then the reference fingerprint data is statistically matched to the index of the successful matching; the fingerprint data database moving module is set to if If the index satisfies the preset database transfer condition, the reference fingerprint data is transferred from the second audio fingerprint database to the first audio fingerprint database.
  • computer device 12 takes the form of a general-purpose computing device.
  • Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .
  • a program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment.
  • Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Abstract

Provided are an audio fingerprint processing method and apparatus, and a computer device and a storage medium. The audio fingerprint processing method comprises: generating target fingerprint data for target audio data (101); respectively matching the target fingerprint data with reference fingerprint data in a first audio fingerprint database and reference fingerprint data in a second audio fingerprint database (102); if matching fails, calling a music query service interface to query copyright information of the target audio data; if the copyright information is found, storing the target fingerprint data in the first audio fingerprint database, taking same as new reference fingerprint data in the first audio fingerprint database, and recording the copyright information of the target audio data; and if no copyright information is found, storing the target fingerprint data in the second audio fingerprint database, and taking same as new reference fingerprint data in the second audio fingerprint database.

Description

音频指纹的处理方法、装置、计算机设备和存储介质Audio fingerprint processing method, device, computer equipment and storage medium
本申请要求在2021年03月18日提交中国专利局、申请号为202110292844.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202110292844.8 filed with the China Patent Office on March 18, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及音频处理的技术领域,例如涉及一种音频指纹的处理方法、装置、计算机设备和存储介质。The embodiments of the present application relate to the technical field of audio processing, for example, to an audio fingerprint processing method, apparatus, computer device, and storage medium.
背景技术Background technique
随着互联网的飞速发展,尤其是移动终端的广泛普及,用户可以方便地制作多媒体数据,例如,制作短视频、哼唱歌曲、录音,等等,使得互联网中的多媒体数据的数据量快速增长,音频数据的数据量也随之快速增长。With the rapid development of the Internet, especially the widespread popularity of mobile terminals, users can easily create multimedia data, such as making short videos, humming songs, recordings, etc., which makes the amount of multimedia data in the Internet grow rapidly. The data volume of audio data also increases rapidly.
在歌曲搜索、语音内容审核等业务场景中,会对音频数据进行比对,判断音频数据是否相同或相似。In business scenarios such as song search and voice content review, the audio data is compared to determine whether the audio data is the same or similar.
由于音频数据的数量众多,一些音乐版权方收录不同的音频数据、记录收录的音频的版权信息,并提供音乐查询服务接口(Music Query Service Interface,MQSI),从而提供独立的音乐查询服务。Due to the large amount of audio data, some music copyright owners record different audio data, record the copyright information of the recorded audio, and provide a Music Query Service Interface (MQSI) to provide an independent music query service.
在诸如短视频等场景中,每天客户端上传至平台的音频数据的量级可达千万甚至亿级,短视频等多媒体数据更新速度较快,容易产生新的音频数据,而新的音频数据并未被音乐版权方收录,若调用音乐查询服务接口查询音频数据,可能无法查询到相关信息,导致查询效率较低,而且,音乐查询服务通常为付费服务,查询量大会导致运营成本较高。In scenarios such as short videos, the amount of audio data uploaded by the client to the platform every day can reach tens of millions or even hundreds of millions. The update speed of multimedia data such as short videos is fast, and it is easy to generate new audio data. It has not been recorded by the music copyright owner. If you call the music query service interface to query the audio data, you may not be able to query the relevant information, resulting in low query efficiency. Moreover, the music query service is usually a paid service, and the query volume will lead to high operating costs.
发明内容SUMMARY OF THE INVENTION
本申请实施例提出了一种音频指纹的处理方法、装置、计算机设备和存储介质,解决了大量多媒体数据更新速度较快导致调用音乐查询服务接口查询音频数据效率较低、运营成本较高的问题。The embodiment of the present application proposes an audio fingerprint processing method, device, computer equipment and storage medium, which solves the problem that a large amount of multimedia data is updated quickly, resulting in low efficiency and high operating cost of invoking a music query service interface to query audio data. .
本申请实施例提供了一种音频指纹的处理方法,包括:Embodiments of the present application provide a method for processing audio fingerprints, including:
对目标音频数据生成目标指纹数据;generating target fingerprint data for the target audio data;
将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;Matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;
在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二 音频指纹库中的参考指纹数据均匹配失败的情况下,调用音乐查询服务接口查询所述目标音频数据的版权信息;If the target fingerprint data fails to match the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the target audio data copyright information;
在已查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Referring to the fingerprint data, and recording the copyright information of the target audio data;
在未查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。In the case where the copyright information of the target audio data is not queried, the target fingerprint data is stored in the second audio fingerprint database, so that the target fingerprint data is used as a new content in the second audio fingerprint database Reference fingerprint data.
本申请实施例还提供了一种音频指纹的处理装置,包括:The embodiment of the present application also provides an audio fingerprint processing device, including:
指纹数据生成模块,设置为对目标音频数据生成目标指纹数据;A fingerprint data generation module, configured to generate target fingerprint data for the target audio data;
指纹数据匹配模块,设置为将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;A fingerprint data matching module, configured to match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;
接口查询模块,设置为在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二音频指纹库中的参考指纹数据均匹配失败的情况下,调用音乐查询服务接口查询所述目标音频数据的版权信息;The interface query module is configured to call the music query service interface when the target fingerprint data and the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database all fail to match query the copyright information of the target audio data;
第一更新模块,设置为在已查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;The first update module is configured to store the target fingerprint data in the first audio fingerprint database when the copyright information of the target audio data has been queried to use the target fingerprint data as the first New reference fingerprint data in an audio fingerprint database, and record the copyright information of the target audio data;
第二更新模块,设置为在未查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。A second update module, configured to store the target fingerprint data in the second audio fingerprint database under the condition that the copyright information of the target audio data is not queried, so as to use the target fingerprint data as the first 2. New reference fingerprint data in the audio fingerprint library.
本申请实施例还提供了一种计算机设备,所述计算机设备包括:Embodiments of the present application also provide a computer device, the computer device comprising:
一个或多个处理器;one or more processors;
存储器,设置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请任意实施例所述的音频指纹的处理方法。A memory, configured to store one or more programs, which, when executed by the one or more processors, enable the one or more processors to implement the audio fingerprinting described in any embodiment of the present application processing method.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现本申请任意实施例所述的音频指纹的处理方法。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the audio fingerprint processing method described in any embodiment of the present application is implemented .
附图说明Description of drawings
图1为本申请实施例一提供的一种音频指纹的处理方法的流程图;1 is a flowchart of an audio fingerprint processing method provided in Embodiment 1 of the present application;
图2是本申请实施例二提供的一种音频指纹的处理方法的流程图;FIG. 2 is a flowchart of an audio fingerprint processing method provided in Embodiment 2 of the present application;
图3为本申请实施例三提供的一种音频指纹的处理装置的结构示意图;3 is a schematic structural diagram of an apparatus for processing audio fingerprints according to Embodiment 3 of the present application;
图4为本申请实施例四提供的一种计算机设备的结构示意图。FIG. 4 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请进行说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The present application will be described below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.
实施例一Example 1
图1为本申请实施例一提供的一种音频指纹的处理方法的流程图,本实施例可适用于对指纹库进行分级聚类、从而减少调用音乐查询服务接口的情况,该方法可以由音频指纹的处理装置来执行,该音频指纹的处理装置可以由软件和/或硬件实现,可配置在计算机设备中,例如,服务器、工作站、个人电脑,等等。所述音频指纹的处理方法包括如下步骤:1 is a flowchart of an audio fingerprint processing method provided in Embodiment 1 of the application. This embodiment can be applied to hierarchically clustering a fingerprint database, thereby reducing the situation of calling a music query service interface. The fingerprint processing means can be implemented by software and/or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, and the like. The processing method of the audio fingerprint includes the following steps:
步骤101、对目标音频数据生成目标指纹数据。Step 101: Generate target fingerprint data for target audio data.
在本实施例中,计算机设备可以通过不同的方式获取音频数据,例如,接收用户上传的音频数据、向版权方购买音频数据、技术人员录制音频数据、使用爬虫客户端从网络中爬取音频数据,等等。In this embodiment, the computer device can acquire audio data in different ways, for example, receiving audio data uploaded by users, purchasing audio data from copyright owners, recording audio data by technicians, and using crawler clients to crawl audio data from the network ,and many more.
该音频数据的形式可以为歌手发布的歌曲,从短视频、电影、电视剧等视频数据中分离的音频数据以及用户在移动终端录制的语音信号,等等。该音频数据的格式可以包括动态影像专家压缩标准音频层面3(Moving Picture Experts Group Audio Layer III,MP3)、视窗媒体音频(Windows Media Audio,WMA)、高级音频编码格式(Advanced Audio Coding,AAC)等等,本实施例对此不加以限制。The audio data can be in the form of songs released by singers, audio data separated from video data such as short videos, movies, and TV dramas, and voice signals recorded by the user on the mobile terminal, and so on. The format of the audio data may include moving picture expert compression standard audio layer 3 (Moving Picture Experts Group Audio Layer III, MP3), Windows Media Audio (Windows Media Audio, WMA), Advanced Audio Coding (Advanced Audio Coding, AAC), etc. etc., which are not limited in this embodiment.
计算机设备作为多媒体平台,一方面,可为用户提供基于音频的服务,例如,向用户提供直播节目、短视频、语音会话、视频会话,等等,另一方面,可接收用户上传的携带音频的文件,例如,直播数据、短视频、会话信息,等等。As a multimedia platform, computer equipment, on the one hand, can provide users with audio-based services, such as providing users with live programs, short videos, voice conversations, video conversations, etc. Files, such as live data, short videos, session information, etc.
不同的多媒体平台可按照业务、法律等因素制定视频内容审核标准,在发布携带音频的文件之前,按照该审核标准对该携带音频的文件的内容进行审核,过滤掉一些不符合视频内容审核标准的携带音频的文件,如包含色情、低俗、 暴力等内容的携带音频的文件,从而发布一些符合视频内容审核标准的携带音频的文件。Different multimedia platforms can formulate video content review standards based on business, legal and other factors. Before publishing a file with audio, review the content of the file with audio according to the review standard, and filter out some that do not meet the video content review standards. Audio-carrying files, such as audio-carrying files that contain pornographic, vulgar, violence, etc. content, so as to release some audio-carrying files that meet the video content review standards.
如果对于内容审核的实时性要求较高,在多媒体平台中可设置流式实时系统,用户通过客户端实时将携带音频的文件上传至该流式实时系统,该流式实时系统可将该携带音频的文件传输至用于内容审核的计算机设备。If the real-time requirements for content review are high, a streaming real-time system can be set up in the multimedia platform, and the user uploads the audio-carrying file to the streaming real-time system through the client in real time, and the streaming real-time system can carry the audio file to the real-time streaming system. files to a computer device used for content moderation.
如果对于内容审核的实时性要求较低,在多媒体平台中可设置数据库,如分布式数据库等,用户通过客户端将携带音频的文件上传至该数据库,设置为内容审核的计算机设备可从该数据库读取该携带音频的文件。If the real-time requirements for content auditing are low, a database, such as a distributed database, can be set up in the multimedia platform. The user uploads audio files to the database through the client, and the computer equipment set for content auditing can download from the database. Read the file that carries the audio.
多媒体平台中,既对用户上传的携带音频的文件计算指纹数据,也对自身的音频数据计算指纹数据,其中,指纹数据是利用音频数据的频谱中的峰值和相对位置等信息,来表示音频数据的特征,指纹数据对每一个音频数据具有唯一性,从而基于音频指纹可以实现音频数据的搜索、内容审核等服务。In the multimedia platform, fingerprint data is calculated not only for the files that carry audio uploaded by the user, but also for its own audio data. The fingerprint data uses information such as peaks and relative positions in the frequency spectrum of the audio data to represent audio data. The fingerprint data is unique to each audio data, so that audio data search, content audit and other services can be implemented based on audio fingerprints.
为便于区分,在本实施例中,携带音频的文件、音频数据可以称之为目标音频数据,对目标音频数据生成的指纹数据称之为目标指纹数据。For ease of distinction, in this embodiment, the file and audio data carrying audio may be referred to as target audio data, and the fingerprint data generated from the target audio data is referred to as target fingerprint data.
在本申请的一个实施例中,步骤101可以包括如下步骤:In an embodiment of the present application, step 101 may include the following steps:
步骤1011、将目标音频数据划分为多帧音频信号。Step 1011: Divide the target audio data into multi-frame audio signals.
在本实施例中,可每间隔预设的长度切分目标音频数据,从而得到多帧音频信号。In this embodiment, the target audio data may be segmented with a preset length at intervals, thereby obtaining multiple frames of audio signals.
步骤1012、将多帧音频信号转换为频谱图。Step 1012: Convert the multi-frame audio signal into a spectrogram.
音频信号中包含大量的频率分量,多个频率分量相互独立,并且沿着时间轴不断变化,不同的音频信号当中频率分量和频率分量的变化各不相同,在本实施例中,通过分析音频信号的频率特征得出音频信号的特征,为更直观的对频率进行分析,通常将时域上的音频信号转换到频域上,得到频谱图,其中,频谱图的横轴(X坐标)为时间、纵轴(Y坐标)为频率。The audio signal contains a large number of frequency components, and multiple frequency components are independent of each other and change continuously along the time axis. The frequency components and frequency components in different audio signals are different. In this embodiment, by analyzing the audio signal The characteristics of the audio signal are obtained from the frequency characteristics of the frequency. In order to analyze the frequency more intuitively, the audio signal in the time domain is usually converted to the frequency domain to obtain a spectrogram, where the horizontal axis (X coordinate) of the spectrogram is time. , the vertical axis (Y coordinate) is the frequency.
本实施例中,可通过傅里叶变换(Discrete Fourier Transform,DFT)、短时傅里叶变换(short-time Fourier transform,或,short-term Fourier transform,STFT)等方式将音频信号转换为频谱图。傅里叶变换能反映音频信号中频率的均值,却无法反映频率随时间变化的动态特征,而短时傅里叶变换通过给音频信号添加窗口克服这一弱点,既能反映音频信号的频率强度,又能反映频率强度随时间的变化。In this embodiment, the audio signal may be converted into a frequency spectrum by means of Fourier transform (Discrete Fourier Transform, DFT), short-time Fourier transform (short-time Fourier transform, or, short-term Fourier transform, STFT), etc. picture. The Fourier transform can reflect the average value of the frequency in the audio signal, but it cannot reflect the dynamic characteristics of the frequency changing with time. The short-time Fourier transform overcomes this weakness by adding a window to the audio signal, which can reflect the frequency intensity of the audio signal. , and can reflect the change of frequency intensity with time.
把时域信号变成频域信号会损失时间信息,因此,短时傅里叶变换可以采用数据块(又称窗口)的方式,将一大段时域上的音频信号分成多个数据块, 对多个数据块分别转换,得到多个频域信号,这样在一定程度上保留时间信息。Converting a time-domain signal into a frequency-domain signal will lose time information. Therefore, the short-time Fourier transform can use a data block (also known as a window) to divide a large segment of the audio signal in the time domain into multiple data blocks. Convert multiple data blocks separately to obtain multiple frequency domain signals, which preserves time information to a certain extent.
例如,音频信号的参数为双声道、16-bit精度、44100Hz采样,这时1s的数据大小为441002byte2声道≈176kB,如果选择4kB当作数据块的大小,则每秒钟要对44块数据进行短时傅里叶变换,这样的切分密度可满足需求。For example, the parameters of the audio signal are two-channel, 16-bit precision, and 44100Hz sampling. At this time, the data size of 1s is 441002byte and 2 channels≈176kB. If 4kB is selected as the size of the data block, 44 blocks need to be processed every second. The data is subjected to short-time Fourier transform, and such a segmentation density can meet the requirements.
步骤1013、在频谱图上遍历表示峰值的多个数据点,将每个数据点作为一个峰值点。Step 1013 , traverse multiple data points representing peak values on the spectrogram, and use each data point as a peak point.
音频信号的幅值较大的频率跨度可能很广,从低音C(32.70Hz)到高音C(4186.01Hz)都可能出现。为了避免分析整个频谱图,降低计算量,可将频谱图分成多个频谱带(又称子带)。Audio signals with large amplitudes may span a wide frequency range, from low C (32.70 Hz) to high C (4186.01 Hz). In order to avoid analyzing the entire spectrogram and reduce the amount of computation, the spectrogram can be divided into multiple spectral bands (also called sub-bands).
从每个子带中选择表示频率属于峰值的数据点,并将该数据点作为峰值点。所谓峰值,表示在先有足够量的、频率处于上升的点,且在后有足够量的、频率处于下降的点,例如,选择如下多个子带:低音子带为30Hz-40Hz,40Hz-80Hz和80Hz-120Hz(贝司吉他等乐器的基频会出现低音子带),中音和高音子带分别为120Hz-180Hz和180Hz-300Hz(人声和大部分其他乐器的基频出现在这两个子带)。From each subband, select the data point representing the frequency that belongs to the peak, and use that data point as the peak point. The so-called peak means that there is a sufficient amount of points at which the frequency is rising, and there is a sufficient amount at the point where the frequency is falling. For example, select the following sub-bands: the low sub-band is 30Hz-40Hz, 40Hz-80Hz and 80Hz-120Hz (bass guitars and other instruments have the fundamental frequency of the bass subband), and the midrange and treble subbands are 120Hz-180Hz and 180Hz-300Hz respectively (the fundamental frequency of vocals and most other instruments appears in these two subband).
由于能量(即频谱图上的幅值)越大的点抗噪性就越强,因此,针对每个子带,可按照能量选择峰值点。通常情况下,可在每个子带中选择能量最大的点作为峰值点。Since the point with higher energy (ie, the amplitude on the spectrogram) is more resistant to noise, for each subband, the peak point can be selected according to the energy. Usually, the point with the maximum energy can be selected as the peak point in each subband.
步骤1014、提取每个峰值点的特征信息。Step 1014: Extract characteristic information of each peak point.
在本实施例中,可以通过分析每个峰值点自身的特性以及峰值点之间的特性,将得到的特性作为特征信息。In this embodiment, the obtained characteristics can be used as characteristic information by analyzing the characteristics of each peak point itself and the characteristics between the peak points.
在一个示例中,可查询每个峰值点的频率值,将频率值作为所述每个峰值点的特征信息。In one example, the frequency value of each peak point may be queried, and the frequency value may be used as characteristic information of each peak point.
在另一个示例中,遍历每个峰值点,可测量每个峰值点与其他峰值点中每个峰值点在时间上的第一距离,将第一距离作为所述每个峰值点的特征信息。In another example, by traversing each peak point, a first distance in time between each peak point and each of the other peak points may be measured, and the first distance may be used as the characteristic information of each peak point.
在一示例中,由于峰值点在频谱图中对应的横坐标为时间,因此可以统计每个峰值点与其他峰值点中每个峰值点在时间上的间隔,将每个峰值点与其他峰值点中每个峰值峰值点在时间上的间隔作为所述每个峰值点的第一距离。In an example, since the abscissa corresponding to the peak point in the spectrogram is time, the time interval between each peak point and each of the other peak points can be counted, and each peak point can be compared with other peak points. The time interval of each peak-to-peak point in the above is taken as the first distance of each peak point.
针对一个峰值点,其他峰值点为频谱图上除所述一个峰值点之外的峰值点。For one peak point, other peak points are peak points on the spectrogram except the one peak point.
一个峰值点与其他峰值点在时间上越相近,所述一个峰值点与其他峰值点的相关性越高,因此,针对每个峰值点,寻找在时间的维度下、频谱图上的位于所述每个峰值点的邻域内的其他峰值点,计算当所述每个峰值点与寻找到的 其他峰值点中每个峰值点在时间上的第一距离,将第一距离作为所述每个峰值点的特征信息。The closer a peak point is to other peak points in time, the higher the correlation between the one peak point and other peak points is. Therefore, for each peak point, find the time dimension, on the spectrogram, which is located in each peak point. other peak points in the neighborhood of the peak points, calculate the first distance in time between each peak point and each of the other peak points found, and use the first distance as the each peak point characteristic information.
此外,可以忽略在当前每个峰值点的邻域外的其他峰值点,在保持特征信息的精确度的情况下,降低计算量。In addition, other peak points outside the neighborhood of each current peak point can be ignored, and the amount of calculation is reduced while maintaining the accuracy of the feature information.
在又一个示例中,可测量每个峰值点与其他峰值点中每个峰值点在频率上的第二距离,将第二距离作为所述每个峰值点的特征信息。In yet another example, a second distance in frequency between each peak point and each of the other peak points may be measured, and the second distance may be used as characteristic information of each peak point.
针对一个峰值点,其他峰值点为频谱图上除所述一个峰值点之外的峰值点。For one peak point, other peak points are peak points on the spectrogram except the one peak point.
一个峰值点与其他峰值点在频率上越相近,所述一个峰值点与其他峰值点的相关性越高,因此,针对每个峰值点,寻找在频率的维度下、频谱图上的位于所述每个峰值点的邻域内的其他峰值点,计算所述每个峰值点与寻找到的其他峰值点中每个峰值点在频率上的第二距离,将第二距离作为每个峰值点的特征信息。The closer a peak point is to other peak points in frequency, the higher the correlation between the one peak point and other peak points. Therefore, for each peak point, look for the frequency dimension, on the spectrogram, which is located in each peak point. other peak points in the neighborhood of the peak points, calculate the second distance in frequency between each peak point and each of the other peak points found, and use the second distance as the characteristic information of each peak point .
本实施例中,频率值、第一距离、第二距离可以单独作为峰值点的特征信息,也可以任意组合作为峰值点的特征信息,本实施例对此不加以限制。当频率值、第一距离、第二距离同时作为峰值点的特征信息时,可以从多个模态反映峰值点的特性,从而提高峰值点的特征信息的准确性。In this embodiment, the frequency value, the first distance, and the second distance can be used alone as the characteristic information of the peak point, or can be arbitrarily combined as the characteristic information of the peak point, which is not limited in this embodiment. When the frequency value, the first distance, and the second distance are simultaneously used as the characteristic information of the peak point, the characteristics of the peak point can be reflected from multiple modes, thereby improving the accuracy of the characteristic information of the peak point.
上述峰值点的特征信息只是作为示例,在实施本申请实施例时,可以根据实际情况设置峰值点的其它特征信息,本申请实施例对此不加以限制。另外,除了上述峰值点的特征信息外,本领域技术人员还可以根据实际需要采用峰值点的其它特征信息,本申请实施例对此也不加以限制。The above characteristic information of the peak point is only an example. When implementing the embodiment of the present application, other characteristic information of the peak point may be set according to the actual situation, which is not limited in the embodiment of the present application. In addition, in addition to the characteristic information of the peak point, those skilled in the art can also use other characteristic information of the peak point according to actual needs, which is not limited in this embodiment of the present application.
步骤1015、对每个峰值点的特征信息计算哈希值,将每个峰值点对应的哈希值作为目标音频数据的一个目标指纹数据。Step 1015: Calculate a hash value for the characteristic information of each peak point, and use the hash value corresponding to each peak point as a target fingerprint data of the target audio data.
对于每个峰值点的特征信息,可按照预设的哈希算法对其计算哈希值(hash,又称散列值),将每个峰值点对应的哈希值作为目标音频数据的一个目标指纹数据,用以标识目标音频数据。For the characteristic information of each peak point, a hash value (hash, also known as hash value) can be calculated for it according to a preset hash algorithm, and the hash value corresponding to each peak point can be used as a target of the target audio data Fingerprint data to identify target audio data.
在一个示例中,一个峰值点的特征信息为所述一个峰值点自身的频率值、所述一个峰值点与其他峰值点在时间上的第一距离以及所述一个峰值点与其他峰值点在频率上的第二距离。在本示例中,可将每个峰值点的频率值、第一距离与第二距离转换为二进制格式,在转换完成的情况下,按照预设的排列规则,如频率值在前、第一距离在中、第二距离在后,频率值在后、第一距离在中、第二距离在前,等等,将每个峰值点的二进制格式的频率值、第一距离与第二距离进行拼接,并将拼接结果作为目标音频数据的一个目标指纹数据。二进制格式的指纹数据较为直观,方便将指纹数据转换为原始的频率值、第一距离与 第二距离,从而便于开发的调试,降低开发的成本。In an example, the characteristic information of one peak point is the frequency value of the one peak point itself, the first distance in time between the one peak point and other peak points, and the frequency between the one peak point and other peak points on the second distance. In this example, the frequency value, the first distance, and the second distance of each peak point can be converted into binary format. When the conversion is completed, according to the preset arrangement rules, such as the frequency value first, the first distance In the middle, the second distance is behind, the frequency value is in the back, the first distance is in the middle, the second distance is in the front, etc., the frequency value, the first distance and the second distance of each peak point in binary format are spliced , and use the splicing result as a target fingerprint data of the target audio data. The fingerprint data in binary format is more intuitive, and it is convenient to convert the fingerprint data into the original frequency value, the first distance and the second distance, so as to facilitate the debugging of development and reduce the cost of development.
上述计算哈希值的方式只是作为示例,在实施本申请实施例中,可以根据实际情况设置其它计算哈希值的方式,例如,使用消息摘要算法第五版(Message Digest Algorithm,MD5)、安全散列算法(Secure Hash Algorithm,SHA)等算法对频率值、第一距离与第二距离计算哈希值,本申请实施例对此不加以限制。另外,除了上述峰值点的特征信息外,本领域技术人员还可以根据实际需要采用峰值点的其它特征信息,本申请实施例对此也不加以限制。The above method for calculating the hash value is only an example. In implementing the embodiments of the present application, other methods for calculating the hash value may be set according to actual conditions. Algorithms such as a hash algorithm (Secure Hash Algorithm, SHA) calculate a hash value for the frequency value, the first distance, and the second distance, which is not limited in this embodiment of the present application. In addition, in addition to the characteristic information of the peak point, those skilled in the art can also use other characteristic information of the peak point according to actual needs, which is not limited in this embodiment of the present application.
步骤102、将目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配。Step 102: Match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database.
在本实施例中,可以分别构建两个独立的数据库,作为第一音频指纹库、第二音频指纹库,其中,第一音频指纹库用于存储通过音乐查询服务接口查询具有版权信息的音频数据的参考指纹数据,第二音频指纹库用于存储通过音乐查询服务接口查询不具有版权信息的音频数据的参考指纹数据。In this embodiment, two independent databases may be constructed as a first audio fingerprint database and a second audio fingerprint database, wherein the first audio fingerprint database is used to store audio data with copyright information queried through the music query service interface The reference fingerprint data of the second audio fingerprint database is used to store the reference fingerprint data for querying the audio data without copyright information through the music query service interface.
在初始时,第一音频指纹库、第二音频指纹库可以为空,也可以通过人工本地验证、其他机构验证等方式将一批音频数据中已验证具有版权信息的音频数据的参考指纹数据存储至第一音频指纹库、已验证不具有版权信息的音频数据的参考指纹数据存储至第二音频指纹库作为种子,本实施例对此不加以限制。Initially, the first audio fingerprint database and the second audio fingerprint database can be empty, or the reference fingerprint data of the audio data that has been verified to have copyright information in a batch of audio data can be stored by manual local verification, verification by other institutions, etc. To the first audio fingerprint database, the reference fingerprint data of the audio data verified to have no copyright information is stored in the second audio fingerprint database as a seed, which is not limited in this embodiment.
参考指纹数据也属于音频数据的指纹数据,生成参考指纹数据的方式与生成目标指纹数据的方式相同。The reference fingerprint data also belongs to the fingerprint data of the audio data, and the method of generating the reference fingerprint data is the same as that of generating the target fingerprint data.
在已生成目标音频数据的目标指纹数据的情况下,可以将目标指纹数据与第一音频指纹库的参考指纹数据以及第二音频指纹库中的参考指纹数据进行匹配,从而判断目标指纹数据是否与第一音频指纹库或第二音频指纹库中的参考指纹数据相同或相似。In the case where the target fingerprint data of the target audio data has been generated, the target fingerprint data can be matched with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, so as to determine whether the target fingerprint data matches the reference fingerprint data in the second audio fingerprint database. The reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database are the same or similar.
示例性的,第一音频指纹库中包括多个参考指纹数据,第二音频数据库中包括多个参考指纹数据。Exemplarily, the first audio fingerprint database includes multiple reference fingerprint data, and the second audio database includes multiple reference fingerprint data.
考虑到较多的音频数据均具有版权信息,较少的音频数据属于原创、不具有版权信息,匹配第一音频指纹库中的参考指纹数据的优先级可高于匹配第二音频指纹库中的参考指纹数据的优先级,即,将目标指纹数据与第一音频指纹库中的参考指纹数据进行匹配,若目标指纹数据与第一音频指纹库中的全部参考指纹数据均匹配失败,则将目标指纹数据与第二音频指纹库中的参考指纹数据进行匹配,若目标指纹数据与第一音频指纹库中的任一参考指纹数据匹配成功,则停止将目标指纹数据与第二音频指纹库中的参考指纹数据进行匹配,在较多的音频数据均具有版权信息,较少的音频数据属于原创、不具有版权信息 的情况下,与第一音频指纹库中的参考指纹数据匹配成功的几率较高,与第二音频指纹库中的参考指纹数据匹配成功的几率较低,因此,优先匹配第一音频指纹库中的参考指纹数据,可降低后续匹配第二音频指纹库中的参考指纹数据的计算量,从而提高匹配的效率。Considering that more audio data has copyright information, and less audio data is original and does not have copyright information, the priority of matching the reference fingerprint data in the first audio fingerprint database can be higher than matching the reference fingerprint data in the second audio fingerprint database. The priority of the reference fingerprint data, that is, matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database, if the target fingerprint data fails to match with all the reference fingerprint data in the first audio fingerprint database, then The fingerprint data is matched with the reference fingerprint data in the second audio fingerprint database. If the target fingerprint data is successfully matched with any reference fingerprint data in the first audio fingerprint database, then stop the target fingerprint data and the second audio fingerprint database. Matching with reference to the fingerprint data, in the case that more audio data has copyright information, and less audio data is original and does not have copyright information, the probability of successful matching with the reference fingerprint data in the first audio fingerprint database is high. , the probability of successful matching with the reference fingerprint data in the second audio fingerprint database is low. Therefore, matching the reference fingerprint data in the first audio fingerprint database first can reduce the calculation of subsequent matching of the reference fingerprint data in the second audio fingerprint database. quantity, thereby improving the matching efficiency.
除了匹配第一音频指纹库中的参考指纹数据的优先级可高于匹配第二音频指纹库中的参考指纹数据的优先级之外,匹配第一音频指纹库中的参考指纹数据的优先级也可低于匹配第二音频指纹库中的参考指纹数据的优先级,即,将目标指纹数据与第二音频指纹库中的参考指纹数据进行匹配,若目标指纹数据与第二音频指纹库中的全部参考指纹数据均匹配失败,则将目标指纹数据与第一音频指纹库中的参考指纹数据进行匹配,若目标指纹数据与第二音频指纹库中的任一参考指纹数据均匹配成功,则停止将目标指纹数据与第一音频指纹库中的参考指纹数据进行匹配,本实施例对此不加限制。In addition to the priority of matching reference fingerprint data in the first audio fingerprint database may be higher than the priority of matching reference fingerprint data in the second audio fingerprint database, the priority of matching reference fingerprint data in the first audio fingerprint database is also Can be lower than the priority of matching the reference fingerprint data in the second audio fingerprint database, that is, the target fingerprint data is matched with the reference fingerprint data in the second audio fingerprint database, if the target fingerprint data and the second audio fingerprint database are matched. All the reference fingerprint data fails to match, then the target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database, if the target fingerprint data and any reference fingerprint data in the second audio fingerprint database are matched successfully, then stop The target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database, which is not limited in this embodiment.
在具体实现中,目标音频数据可能为长音频,所以将目标音频数据切分为多帧音频信号计算目标指纹数据时,目标音频数据可能多个目标指纹数据,且对于短视频等多媒体数据,多复用部分具有版权信息的音频数据,如歌曲的高潮部分,因此,可计算每一目标指纹数据与第一音频库中的每一参考指纹数据的相似度,以及计算每一目标指纹数据与第二音频指纹库中的每一参考指纹数据的相似度。若全部目标指纹数据中的连续的n(n为正整数)个目标指纹数据分别与一个音频指纹库中的连续的n个参考指纹数据之间的相似度均大于预设的阈值,则可以确定目标音频数据的连续n个目标指纹数据与该一个音频指纹库中的连续n个参考指纹数据匹配成功,进而确定目标音频数据的目标指纹数据与该一个音频指纹库中的参考指纹数据匹配成功。通过相似度及相对位置的对比,可以保证目标指纹数据与参考指纹数据之间的稳定性,从而保证目标指纹数据与参考指纹数据的准确性。In the specific implementation, the target audio data may be long audio, so when the target audio data is divided into multi-frame audio signals to calculate the target fingerprint data, the target audio data may have multiple target fingerprint data, and for multimedia data such as short videos, many The multiplexed part has audio data with copyright information, such as the climax part of a song. Therefore, the similarity between each target fingerprint data and each reference fingerprint data in the first audio library can be calculated, and the similarity between each target fingerprint data and the first audio data can be calculated. The similarity of each reference fingerprint data in the two audio fingerprint database. If the similarity between consecutive n (n is a positive integer) target fingerprint data in all target fingerprint data and consecutive n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, it can be determined that The consecutive n target fingerprint data of the target audio data are successfully matched with the consecutive n reference fingerprint data in the one audio fingerprint database, and then it is determined that the target fingerprint data of the target audio data is successfully matched with the reference fingerprint data in the one audio fingerprint database. By comparing the similarity and relative position, the stability between the target fingerprint data and the reference fingerprint data can be ensured, thereby ensuring the accuracy of the target fingerprint data and the reference fingerprint data.
步骤103、若目标指纹数据与第一音频指纹库中的参考指纹数据以及第二音频指纹库中的参考指纹数据均匹配失败,则调用音乐查询服务接口查询目标音频数据的版权信息。Step 103: If the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data.
若目标指纹数据与第一音频指纹库中的全部参考指纹数据、第二音频指纹库中的全部参考指纹数据均匹配失败,则表示在计算机设备本地并未搜索到与目标音频数据相同或相似的音频数据,该目标音频数据较大可能为新的音频数据,在这种情况下,可以调用音乐查询服务接口,按照音乐查询服务接口的规范,将目标音频数据发送至音乐版权方的服务器,在音乐版权方的服务器中查询该目标音频数据是否具有版权信息。If the target fingerprint data fails to match with all the reference fingerprint data in the first audio fingerprint database and all the reference fingerprint data in the second audio fingerprint database, it means that the computer device has not searched for the same or similar target audio data locally. Audio data, the target audio data is more likely to be new audio data, in this case, the music query service interface can be called, and the target audio data can be sent to the server of the music copyright owner according to the specifications of the music query service interface. The server of the music copyright party queries whether the target audio data has copyright information.
步骤104、若已查询到目标音频数据的版权信息,则将目标指纹数据存储至 第一音频指纹库中以将目标指纹数据作为第一音频指纹库中新的参考指纹数据,并记录目标音频数据的版权信息。 Step 104, if the copyright information of the target audio data has been queried, then store the target fingerprint data in the first audio fingerprint database to use the target fingerprint data as the new reference fingerprint data in the first audio fingerprint database, and record the target audio data. copyright information.
若音乐版权方的服务器通过音乐查询服务接口返回目标音频数据的版权信息,则可以将该目标指纹数据存储至第一音频指纹库中,该目标指纹数据为第一音频指纹库中新的参考指纹数据。此外,以其他表格或数据库等形式,记录该目标音频数据的版权信息,该版权信息可以以目标音频数据的标识(如标识(Iden,ID))作为索引,与第一音频指纹库中新的参考指纹数据关联。If the server of the music copyright owner returns the copyright information of the target audio data through the music query service interface, the target fingerprint data can be stored in the first audio fingerprint database, and the target fingerprint data is a new reference fingerprint in the first audio fingerprint database data. In addition, the copyright information of the target audio data is recorded in the form of other tables or databases, and the copyright information can be indexed with the identification of the target audio data (such as identification (Iden, ID)), which is the same as the new one in the first audio fingerprint database. See Fingerprint Data Association.
在一种存储方式中,以每个目标指纹数据为键key,该目标音频数据的标识(如ID)、所述每个目标指纹数据所属音频信号的序号为值value,生成键值对(key,value),所述每个目标指纹数据所属音频信号属于目标音频数据中的一帧信号。In a storage method, each target fingerprint data is used as a key key, and the identification (eg ID) of the target audio data and the serial number of the audio signal to which each target fingerprint data belongs are the value value, and a key-value pair (key-value pair) is generated. , value), the audio signal to which each target fingerprint data belongs belongs to a frame of signal in the target audio data.
将键值对(key,value)存储至第一音频指纹库中,并将该键值对作为第一音频指纹库中新的参考指纹数据。A key-value pair (key, value) is stored in the first audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the first audio fingerprint database.
对于每个索引值index,可以提供b个(b为正整数,如2 N)存储位置,以便具有相同的键key、但值value不同的目标指纹数据进行存储,从而在第一音频指纹库中形成一个a(a为键key的长度,即目标指纹数据的长度,属于正整数)行、b列的数据表,以提高存储的效率、提高搜索的简便性。 For each index value index, b (b is a positive integer, such as 2 N ) storage locations can be provided to store target fingerprint data with the same key but different values, so that in the first audio fingerprint database A data table with row a (a is the length of the key, that is, the length of the target fingerprint data, which is a positive integer) and column b is formed to improve storage efficiency and search simplicity.
上述存储目标指纹数据至第一音频指纹库的方式只是作为示例,在实施本申请实施例时,可以根据实际情况设置存储目标指纹数据至第一音频指纹库的其它方式,例如,以目标音频数据的标识为键key、该目标音频数据的所有目标指纹数据为值value,生成键值对(key,value),将键值对(key,value)存储至第一音频指纹库中,等等,本申请实施例对此不加以限制。另外,除了上述存储目标指纹数据至第一音频指纹库的方式外,本领域技术人员还可以根据实际需要采用其它存储目标指纹数据至第一音频指纹库的方式,本申请实施例对此也不加以限制。The above-mentioned method of storing the target fingerprint data in the first audio fingerprint database is only an example. When implementing the embodiments of the present application, other methods of storing the target fingerprint data in the first audio fingerprint database may be set according to the actual situation. The identifier is the key key, and all the target fingerprint data of the target audio data are the value value, generate a key-value pair (key, value), store the key-value pair (key, value) in the first audio fingerprint library, etc., This embodiment of the present application does not limit this. In addition, in addition to the above-mentioned method of storing the target fingerprint data in the first audio fingerprint database, those skilled in the art can also adopt other methods of storing the target fingerprint data in the first audio fingerprint database according to actual needs. be restricted.
步骤105、若未查询到目标音频数据的版权信息,则将目标指纹数据存储至第二音频指纹库中以将目标指纹数据作为第二音频指纹库中新的参考指纹数据。Step 105: If the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.
若音乐版权方的服务器通过音乐查询服务接口返回目标音频数据并不具有版权信息的结果,则可以将该目标指纹数据存储至第二音频指纹库中,该目标指纹数据为第二音频指纹库中新的参考指纹数据。If the server of the music copyright owner returns the result that the target audio data does not have copyright information through the music query service interface, the target fingerprint data can be stored in the second audio fingerprint database, and the target fingerprint data is stored in the second audio fingerprint database. New reference fingerprint data.
在一种存储方式中,以每个目标指纹数据为键key,该目标音频数据的标识(如ID)、所述每个目标指纹数据所属音频信号的序号为值value,生成键值对 (key,value),所述每个目标指纹数据所属音频信号属于目标音频数据中的一帧信号。In a storage method, each target fingerprint data is used as a key key, and the identification (eg ID) of the target audio data and the serial number of the audio signal to which each target fingerprint data belongs are the value value, and a key-value pair (key-value pair) is generated. , value), the audio signal to which each target fingerprint data belongs belongs to a frame of signal in the target audio data.
将键值对(key,value)存储至第二音频指纹库中,并将该键值对作为第二音频指纹库中新的参考指纹数据。A key-value pair (key, value) is stored in the second audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the second audio fingerprint database.
对于每个索引值index,可以提供b个(b为正整数,如2 N)存储位置,以便具有相同的键key、但值value不同的目标指纹进行存储,从而在第二音频指纹库中形成一个a(a为键key的长度,即目标指纹数据的长度,属于正整数)行、b列的数据表,以提高存储的效率、提高搜索的简便性。 For each index value index, b (b is a positive integer, such as 2 N ) storage locations can be provided, so that target fingerprints with the same key but different values can be stored, thereby forming the second audio fingerprint database. A data table with row a (a is the length of the key, that is, the length of the target fingerprint data, which belongs to a positive integer) row and column b, in order to improve the efficiency of storage and the simplicity of searching.
上述存储目标指纹数据至第二音频指纹库的方式只是作为示例,在实施本申请实施例时,可以根据实际情况设置存储目标指纹数据至第二音频指纹库的其它方式,例如,以目标音频数据的标识为键key、该目标音频数据所有目标指纹数据为值value,生成键值对(key,value),将键值对(key,value)存储至第二音频指纹库中,等等,本申请实施例对此不加以限制。另外,除了上述存储目标指纹数据至第二音频指纹库的方式外,本领域技术人员还可以根据实际需要采用其它存储目标指纹数据至第二音频指纹库的方式,本申请实施例对此也不加以限制。The above method of storing the target fingerprint data to the second audio fingerprint database is only an example. When implementing the embodiments of the present application, other methods of storing the target fingerprint data to the second audio fingerprint database may be set according to actual conditions. The identifier of the target audio data is the key key, all the target fingerprint data of the target audio data are the value value, generate a key-value pair (key, value), store the key-value pair (key, value) in the second audio fingerprint database, etc. The application examples do not limit this. In addition, in addition to the above method of storing the target fingerprint data in the second audio fingerprint database, those skilled in the art can also adopt other methods of storing the target fingerprint data in the second audio fingerprint database according to actual needs. be restricted.
需要说明的是,存储目标指纹数据至第一音频指纹库的方式与存储目标指纹数据至第二音频指纹库的方式可以相同,也可以不同,本实施例对此不加以限制。It should be noted that the method of storing the target fingerprint data in the first audio fingerprint database and the method of storing the target fingerprint data in the second audio fingerprint database may be the same or different, which is not limited in this embodiment.
在本实施例中,对目标音频数据生成目标指纹数据;将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二音频指纹库中的参考指纹数据均匹配失败的情况下,调用音乐查询服务接口查询所述目标音频数据的版权信息;在已查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;在未查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。利用音乐查询服务接口作为分级的依据,划分第一音频指纹库、第二音频指纹库,用以区分是否具有版本的音频数据,收录新的音频数据,提高搜索的成功率,使用第一音频指纹库、第二音频指纹库、音乐查询服务接口制定联合分级查询机制,即先搜索第一音频指纹库、第二音频指纹库,再调用音乐查询服务接口,可有效利用第一音频指纹库、第二音频指纹库中的指纹数据,减少音乐查询服务接口的调用次数,从而降低运营成本。In this embodiment, target fingerprint data is generated for the target audio data; the target fingerprint data is matched with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database; In the case where the fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data; In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Refer to the fingerprint data, and record the copyright information of the target audio data; if the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to The target fingerprint data is used as the new reference fingerprint data in the second audio fingerprint database. Using the music query service interface as the basis for grading, divide the first audio fingerprint database and the second audio fingerprint database to distinguish whether there is a version of audio data, record new audio data, improve the success rate of search, and use the first audio fingerprint The library, the second audio fingerprint library, and the music query service interface formulate a joint hierarchical query mechanism, that is, first search the first audio fingerprint library, the second audio fingerprint library, and then call the music query service interface, which can effectively use the first audio fingerprint library, the second audio fingerprint library The fingerprint data in the second audio fingerprint database reduces the number of calls of the music query service interface, thereby reducing operating costs.
实施例二Embodiment 2
图2为本申请实施例二提供的一种音频指纹的处理方法的流程图,本实施例以前述实施例为基础,增加了对目标音频数据聚类、使用生存时间管理参考指纹数据、参考指纹数据转库的操作,该方法包括如下步骤:2 is a flowchart of an audio fingerprint processing method provided in Embodiment 2 of the present application. Based on the foregoing embodiments, this embodiment adds clustering of target audio data, use of time-to-live to manage reference fingerprint data, and reference fingerprint data. The operation of data transfer database, the method includes the following steps:
步骤201、对目标音频数据生成目标指纹数据。Step 201: Generate target fingerprint data for target audio data.
步骤202、将目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配。Step 202: Match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database.
步骤203、若目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据均匹配失败,则调用音乐查询服务接口查询目标音频数据的版权信息。Step 203: If the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the copyright information of the target audio data.
步骤204、若已查询到目标音频数据的版权信息,则将目标指纹数据存储至第一音频指纹库中以将目标指纹数据作为第一音频指纹库中新的参考指纹数据,并记录目标音频数据的版权信息。 Step 204, if the copyright information of the target audio data has been queried, then store the target fingerprint data in the first audio fingerprint database to use the target fingerprint data as the new reference fingerprint data in the first audio fingerprint database, and record the target audio data. copyright information.
步骤205、将目标音频数据作为新的参考音频数据,对新的参考音频数据生成新的簇。Step 205: Use the target audio data as new reference audio data, and generate a new cluster for the new reference audio data.
在本实施例中,如果通过音乐查询服务接口查询到目标音频数据的版权信息,表示在计算机设备本地并未存储与目标音频数据相同或相似的音频数据,此时,除了记录目标音频数据的版权信息之外,还可以将目标音频数据设置为新的参考音频数据,以及,对新的参考音频数据生成新的簇,该簇用于聚类相同或相似的音频数据。In this embodiment, if the copyright information of the target audio data is queried through the music query service interface, it means that the computer equipment does not store the same or similar audio data locally as the target audio data. In addition to the information, the target audio data may also be set as new reference audio data, and a new cluster for clustering the same or similar audio data may be generated for the new reference audio data.
步骤206、若未查询到目标音频数据的版权信息,则将目标指纹数据存储至第二音频指纹库中以将目标指纹数据作为第二音频指纹库中新的参考指纹数据。Step 206: If the copyright information of the target audio data is not queried, store the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.
步骤207、若目标指纹数据与第一音频指纹库中的参考指纹数据匹配成功,则将目标音频数据添加至参考音频数据所属的簇中。Step 207: If the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, add the target audio data to the cluster to which the reference audio data belongs.
在本实施例中,每个音频指纹库中有多个音频数据的参考指纹数据。如果目标指纹数据与第一音频指纹库中的参考指纹数据匹配成功,表示在计算机设备本地已存储与目标音频数据相同或相似的音频数据,为便于区分,该音频数据可称之为参考音频数据。在目标指纹数据与第一音频指纹库中的参考指纹数据匹配成功的情况下,可查找该参考音频数据所属的簇,将目标音频数据添加至该参考音频数据所属的簇中,使得相同或相似的音频数据聚类至同一簇中, 便于后续基于簇进行用户分类、歌曲推荐等业务处理。In this embodiment, each audio fingerprint database has a plurality of reference fingerprint data of audio data. If the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, it means that audio data that is the same as or similar to the target audio data has been stored locally in the computer device. For the convenience of distinction, the audio data may be referred to as reference audio data . When the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, the cluster to which the reference audio data belongs can be searched, and the target audio data can be added to the cluster to which the reference audio data belongs, so that the same or similar The audio data of the data are clustered into the same cluster, which is convenient for subsequent business processing such as user classification and song recommendation based on the cluster.
步骤208、若目标指纹数据与第二音频指纹库中的参考指纹数据匹配成功,则将目标音频数据添加至参考音频数据所属的簇中。Step 208: If the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, add the target audio data to the cluster to which the reference audio data belongs.
在本实施例中,如果目标指纹数据与第二音频指纹库中的参考指纹数据匹配成功,表示在计算机设备本地已存储与目标音频数据相同或相似的音频数据,为便于区分,该音频数据可称之为参考音频数据。在目标指纹数据与第二音频指纹库中的参考指纹数据匹配成功的情况下,可查找该参考音频数据所属的簇,将目标音频数据添加至该参考音频数据所属的簇中,使得相同或相似的音频数据聚类至同一簇中,便于后续基于簇进行用户分类、歌曲推荐等业务处理。In this embodiment, if the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, it means that audio data that is the same or similar to the target audio data has been stored locally in the computer device. It is called reference audio data. When the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the cluster to which the reference audio data belongs can be searched, and the target audio data can be added to the cluster to which the reference audio data belongs, so that the same or similar The audio data of the data is clustered into the same cluster, which is convenient for subsequent business processing such as user classification and song recommendation based on the cluster.
示例性的,若目标音频数据的全部目标指纹数据中的连续的n(n为正整数)个目标指纹数据分别与一个音频指纹库中的连续的n个参考指纹数据之间的相似度均大于预设的阈值,则确定目标音频数据的该连续n个目标指纹数据与该连续的n个参考指纹数据匹配成功。例如n为3,若目标音频数据的连续三个目标指纹数据中的第一个目标指纹数据与第一音频指纹库中连续三个参考指纹数据中的第一个参考指纹数据的相似度大于预设阈值、所述连续三个目标指纹数据中的第二个目标指纹数据与所述连续三个参考指纹数据中的第二个参考指纹数据的相似度大于预设阈值,且所述连续三个目标指纹数据中第三个目标指纹数据与所述连续三个参考指纹数据中的第三个参考指纹数据的相似度大于预设阈值,则确定目标音频数据的所述连续三个目标指纹数据与第一音频指纹库中所述连续三个参考指纹数据匹配成功。Exemplarily, if the similarity between consecutive n (n is a positive integer) target fingerprint data in all target fingerprint data of the target audio data and consecutive n reference fingerprint data in an audio fingerprint database is greater than If the preset threshold is set, it is determined that the consecutive n pieces of target fingerprint data of the target audio data are successfully matched with the consecutive n pieces of reference fingerprint data. For example, n is 3. If the similarity between the first target fingerprint data in the three consecutive target fingerprint data of the target audio data and the first reference fingerprint data in the three consecutive reference fingerprint data in the first audio fingerprint database is greater than the predetermined similarity Set the threshold, the similarity between the second target fingerprint data in the three consecutive target fingerprint data and the second reference fingerprint data in the three consecutive reference fingerprint data is greater than the preset threshold, and the three consecutive The similarity between the third target fingerprint data in the target fingerprint data and the third reference fingerprint data in the three consecutive reference fingerprint data is greater than the preset threshold, then it is determined that the three consecutive target fingerprint data of the target audio data are the same as The three consecutive reference fingerprint data in the first audio fingerprint database are successfully matched.
示例性的,将与目标音频数据的连续n个目标指纹数据匹配成功的连续的n个参考指纹数据所属的音频数据作为参考音频数据,将目标音频数据添加至该参考音频数据所属的簇。Exemplarily, the audio data to which consecutive n pieces of reference fingerprint data successfully matched with the target audio data belong to the reference audio data are used as the reference audio data, and the target audio data is added to the cluster to which the reference audio data belongs.
步骤209、若第二音频指纹库中的参考指纹数据与目标指纹数据匹配成功,则对参考指纹数据统计匹配成功的指标。Step 209: If the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, count the indicators of the successful matching of the reference fingerprint data.
步骤210、若指标满足预设的转库条件,则将参考指纹数据从第二音频指纹库转移至第一音频指纹库。Step 210: If the index satisfies the preset transfer conditions, transfer the reference fingerprint data from the second audio fingerprint database to the first audio fingerprint database.
考虑到在网络发布新歌曲、短视频更新速度较快等场景下,容易产生新的音频数据,而并未被音乐版权方收录的情况,可以预先针对第二音频指纹库中的参考指纹数据设置转库条件,在满足该转库条件时,可将参考指纹数据转库。Considering the situation that new audio data is easily generated in scenarios such as new songs released on the Internet and short video updates are fast, but not included by the music copyright party, the reference fingerprint data in the second audio fingerprint database can be set in advance. When the transfer condition is met, the reference fingerprint data can be transferred to the database.
在本实施例中,如果目标指纹数据与第二音频指纹库中的参考指纹数据匹配成功,则可以对该参考指纹数据统计匹配成功的指标,例如,匹配成功的总数量、匹配成功的频次,等等。In this embodiment, if the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the reference fingerprint data can be counted as indicators of successful matching, for example, the total number of successful matching, the frequency of successful matching, and many more.
示例性的,若第二音频数据库中的连续n个参考指纹数据与目标音频数据的目标指纹数据匹配成功,则对该连续n个参考指纹数据中的每一参考指纹数据统计匹配成功的指标。例如,n为3,若第二音频音频数据库中的连续三个参考指纹数据与目标音频数据的目标指纹数据匹配成功,则分别将该连续三个参考指纹数据的匹配成功的总数量加1。Exemplarily, if the consecutive n pieces of reference fingerprint data in the second audio database are successfully matched with the target fingerprint data of the target audio data, an indicator of successful matching is counted for each of the consecutive n pieces of reference fingerprint data. For example, if n is 3, if three consecutive reference fingerprint data in the second audio audio database are successfully matched with the target fingerprint data of the target audio data, then add 1 to the total number of successful matching of the three consecutive reference fingerprint data.
将该指标与同一纬度下的转库条件进行比较,例如,匹配成功的总数量大于或等于第一阈值、匹配成功的频次大于或等于第二阈值,等等。Compare this indicator with the transfer conditions at the same latitude, for example, the total number of successful matches is greater than or equal to the first threshold, the frequency of successful matches is greater than or equal to the second threshold, and so on.
如果该指标满足预设的转库条件,表示该参考指纹数据属于较为热门的音频数据,有可能属于新发布的歌曲等情况,可将参考指纹数据从第二音频指纹库转移至第一音频指纹库,并生成提示信息,该提示信息用于提示运营人员对该参考指纹数据所属的音频数据添加版权信息。If the indicator satisfies the preset library transfer conditions, it means that the reference fingerprint data belongs to relatively popular audio data, possibly a newly released song, etc., the reference fingerprint data can be transferred from the second audio fingerprint database to the first audio fingerprint database, and generate prompt information, the prompt information is used to prompt the operator to add copyright information to the audio data to which the reference fingerprint data belongs.
如果该指标未满足预设的转库条件,则可以保持该参考指纹数据存储在第二音频指纹库中。If the index does not meet the preset library transfer condition, the reference fingerprint data can be kept and stored in the second audio fingerprint library.
步骤211、对第一音频指纹库和/或第二音频指纹库中的参考指纹数据设置生存时间。Step 211: Set the time-to-live for the reference fingerprint data in the first audio fingerprint database and/or the second audio fingerprint database.
在短视频等场景中,部分音频数据的更迭速度较快,在流行一段时间之后,该音频数据较少被用户使用,针对类似的场景,对于第一音频指纹库中的参考指纹数据,可以设置指定的第一数值作为该参考指纹数据的生存时间,对于第二音频指纹库中的参考指纹数据,也可以设置指定的第二数值作为该参考指纹数据的生存时间。In scenarios such as short videos, the changing speed of some audio data is fast. After a period of popularity, the audio data is rarely used by users. For similar scenarios, for the reference fingerprint data in the first audio fingerprint database, you can set The specified first value is used as the lifetime of the reference fingerprint data. For the reference fingerprint data in the second audio fingerprint database, the specified second value can also be set as the lifetime of the reference fingerprint data.
考虑到较多的音频数据均具有版权信息,较少的音频数据属于原创、不具有版权信息,目标指纹数据与第一音频指纹库中的参考指纹数据匹配成功的几率较高,与第二音频指纹库中的参考指纹数据匹配成功的几率较低,可以设置第一音频指纹库中参考指纹数据的生存时间大于第二音频指纹库中参考指纹数据的生存时间,即第一数值大于第二数值,从而保持第一音频指纹库中的参考指纹数据匹配成功的几率,减少音乐查询服务接口的调用频次,降低运营成本。Considering that more audio data have copyright information, and less audio data are original and do not have copyright information, the target fingerprint data and the reference fingerprint data in the first audio fingerprint database have a higher chance of successfully matching, and the second audio data has a higher probability of being successfully matched. The probability of successful matching of the reference fingerprint data in the fingerprint database is low, and the survival time of the reference fingerprint data in the first audio fingerprint database can be set to be greater than the survival time of the reference fingerprint data in the second audio fingerprint database, that is, the first value is greater than the second value. , thereby maintaining the probability of successful matching of the reference fingerprint data in the first audio fingerprint database, reducing the calling frequency of the music query service interface, and reducing operating costs.
除了第一音频指纹库中参考指纹数据的生存时间大于第二音频指纹库中参考指纹数据的生存时间之外,还可以设置第一音频指纹库中参考指纹数据的生存时间等于或小于第二音频指纹库中参考指纹数据的生存时间,即第一数值等于或小于第二数值,本实施例对此不加以限制。Except that the lifetime of the reference fingerprint data in the first audio fingerprint database is greater than the lifetime of the reference fingerprint data in the second audio fingerprint database, the lifetime of the reference fingerprint data in the first audio fingerprint database can also be set to be equal to or less than that of the second audio fingerprint database. The lifetime of the reference fingerprint data in the fingerprint database, that is, the first value is equal to or smaller than the second value, which is not limited in this embodiment.
步骤212、将生存时间进行衰减。Step 212: Attenuate the survival time.
对于参考指纹数据的生存时间,可以启动计时器进行倒计时,以便对该生存时间进行衰减,即不断减少生存时间的数值。For the lifetime of the reference fingerprint data, a timer can be started to count down, so as to attenuate the lifetime, that is, to continuously decrease the value of the lifetime.
一般情况下,可以按照正常的时间流速进行衰减,并不变速衰减。Under normal circumstances, the attenuation can be carried out according to the normal time flow rate, and the attenuation is not variable speed.
步骤213、若第一音频指纹库或第二音频指纹库中的参考指纹数据与目标指纹数据匹配成功,则增加生存时间。Step 213: If the reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database is successfully matched with the target fingerprint data, increase the survival time.
如果第一音频指纹库中的参考指纹数据与目标指纹数据匹配成功,则可以增加该参考指纹数据的生存时间,例如,将该生存时间恢复至原始的第一数值,在该生存时间当前数值的基础上增加第一步长,等等。If the reference fingerprint data in the first audio fingerprint database is successfully matched with the target fingerprint data, the survival time of the reference fingerprint data can be increased. For example, the survival time is restored to the original first value. Increase the first step length on the basis, and so on.
如果第二音频指纹库中的参考指纹数据与目标指纹数据匹配成功,则可以增加该参考指纹数据的生存时间,例如,将该生存时间恢复至原始的第二数值,在该生存时间当前数值的基础上增加第二步长,等等。If the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, the survival time of the reference fingerprint data can be increased, for example, the survival time is restored to the original second value. Increase the second step size on the basis, and so on.
步骤214、若生存时间衰减完毕时,从第一音频指纹库或第二音频指纹库中删除参考指纹数据。Step 214: Delete the reference fingerprint data from the first audio fingerprint database or the second audio fingerprint database if the time-to-live decay is completed.
如果第一音频指纹库中的参考指纹数据衰减完毕,即当前数值为0,表示该参考指纹数据所属音频数据的使用频率较低,在这种情况下,可以从第一音频指纹库中删除该参考指纹数据,在保持第一音频指纹库中的参考指纹数据的匹配成功率的情况下,降低第一音频指纹库中存储的参考指纹数据的数据量,释放第一音频指纹库的空间,从而有效地满足在有限的库容量条件下,处理连续的指纹数据的入库需求。If the reference fingerprint data in the first audio fingerprint database has been attenuated, that is, the current value is 0, it means that the frequency of use of the audio data to which the reference fingerprint data belongs is relatively low. In this case, the reference fingerprint data can be deleted from the first audio fingerprint database. With reference to the fingerprint data, while maintaining the matching success rate of the reference fingerprint data in the first audio fingerprint database, reduce the data volume of the reference fingerprint data stored in the first audio fingerprint database, release the space of the first audio fingerprint database, thereby Effectively meet the storage requirements of processing continuous fingerprint data under the condition of limited storage capacity.
如果第二音频指纹库中的参考指纹数据衰减完毕,即当前数值为0,表示该参考指纹数据所属音频数据的使用频率较低,此时,可以从第二音频指纹库中删除该参考指纹数据,在保持第二音频指纹库中的参考指纹数据的匹配成功率的情况下,降低第二音频指纹库中存储的参考指纹数据的数据量,释放第二音频指纹库的空间,从而有效地满足在有限的库容量条件下,处理连续的指纹数据的入库需求。If the reference fingerprint data in the second audio fingerprint database has been attenuated, that is, the current value is 0, it means that the frequency of use of the audio data to which the reference fingerprint data belongs is relatively low. At this time, the reference fingerprint data can be deleted from the second audio fingerprint database. , under the situation of keeping the matching success rate of the reference fingerprint data in the second audio fingerprint database, reduce the data volume of the reference fingerprint data stored in the second audio fingerprint database, release the space of the second audio fingerprint database, thereby effectively satisfying Under the condition of limited storage capacity, it handles the storage requirements of continuous fingerprint data.
示例性的,对每个音频指纹库中的每一参考指纹数据设置生存时间并将该生存时间进行衰减,在确定一个音频指纹库中的一个参考指纹数据与目标音频数据的目标指纹数据匹配的情况下,将该一个音频指纹库中的该一个参考指纹数据的生成时间增加,例如若第二音频数据库中的连续三个参考指纹数据与目标音频数据的连续三个目标指纹数据匹配成功,则增加该连续三个参考指纹数据中的每一参考指纹数据的生成时间。在一个音频指纹库中的一个参考指纹数据的生成时间衰减完毕的情况下,将所述一个参考指纹数据从所述一个音频指纹库中删除。Exemplarily, a time-to-live is set for each reference fingerprint data in each audio fingerprint database and the time to live is attenuated, and a reference fingerprint data in an audio fingerprint database is determined to match the target fingerprint data of the target audio data. Under the situation, the generation time of this one reference fingerprint data in this one audio fingerprint database is increased, for example, if three consecutive reference fingerprint data in the second audio database are matched with three consecutive target fingerprint data of target audio data, then The generation time of each of the three consecutive reference fingerprint data is increased. In the case that the generation time of one reference fingerprint data in one audio fingerprint database has been decayed, the one reference fingerprint data is deleted from the one audio fingerprint database.
实施例三Embodiment 3
图3为本申请实施例三提供的一种音频指纹的处理装置的结构框图,可以包括如下模块:3 is a structural block diagram of an apparatus for processing audio fingerprints provided in Embodiment 3 of the present application, which may include the following modules:
指纹数据生成模块301,设置为对目标音频数据生成目标指纹数据;指纹数据匹配模块302,设置为将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;接口查询模块303,设置为若所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二音频指纹库中的参考指纹数据均匹配失败,则调用音乐查询服务接口查询所述目标音频数据的版权信息;第一更新模块304,设置为若已查询到所述目标音频数据的版权信息,则将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;第二更新模块305,设置为若未查询到所述目标音频数据的版权信息,则将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。The fingerprint data generation module 301 is configured to generate target fingerprint data for the target audio data; the fingerprint data matching module 302 is configured to compare the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database. The reference fingerprint data is matched; the interface query module 303 is set to if the target fingerprint data fails to match with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, then Invoke the music query service interface to query the copyright information of the target audio data; the first update module 304 is configured to store the target fingerprint data in the first audio if the copyright information of the target audio data has been queried In the fingerprint database, the target fingerprint data is used as the new reference fingerprint data in the first audio fingerprint database, and the copyright information of the target audio data is recorded; the second update module 305 is set to if the copyright information of the target audio data, the target fingerprint data is stored in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database.
在本申请的一个实施例中,所述指纹数据生成模块301包括:In an embodiment of the present application, the fingerprint data generation module 301 includes:
音频信号划分模块,设置为将所述目标音频数据划分为多帧音频信号;频谱图转换模块,设置为将多帧音频信号转换为频谱图;峰值点查找模块,设置为在所述频谱图上遍历表示峰值的多个数据点,将每个数据点作为一个峰值点;特征信息提取模块,设置为提取所述每个峰值点的特征信息;哈希值计算模块,设置为对每个峰值点的特征信息计算哈希值,将每个峰值点对应的所述哈希值作为所述目标音频数据的一个目标指纹数据。an audio signal division module, configured to divide the target audio data into multi-frame audio signals; a spectrogram conversion module, configured to convert the multi-frame audio signals into a spectrogram; a peak point search module, set to be on the spectrogram Traverse multiple data points representing peaks, and take each data point as a peak point; the feature information extraction module is set to extract the feature information of each peak point; the hash value calculation module is set to extract the feature information of each peak point The hash value is calculated from the characteristic information of each peak point, and the hash value corresponding to each peak point is used as a target fingerprint data of the target audio data.
在本申请的一个实施例中,所述特征信息提取模块包括:In an embodiment of the present application, the feature information extraction module includes:
频率值查询模块,设置为查询每个峰值点的频率值,将所述频率值作为所述每个峰值点的特征信息;时间距离测量模块,设置为测量每个峰值点与其他峰值点中每个峰值点在时间上的第一距离,将所述第一距离作为所述每个峰值点的特征信息;频率距离测量模块,设置为测量每个峰值点与其他峰值点中每个峰值点在频率上的第二距离,将所述第二距离作为所述每个峰值点的特征信息。The frequency value query module is set to query the frequency value of each peak point, and the frequency value is used as the characteristic information of each peak point; the time distance measurement module is set to measure each peak point and every other peak point. The first distance of each peak point in time, the first distance is used as the characteristic information of each peak point; the frequency distance measurement module is set to measure the distance between each peak point and each peak point in other peak points. The second distance in frequency, and the second distance is used as the characteristic information of each peak point.
在本申请的一个实施例中,所述时间距离测量模块包括:In an embodiment of the present application, the time distance measurement module includes:
时间邻域搜索模块,设置为寻找在时间的维度下、所述频谱图上的位于每个峰值点的邻域内的其他峰值点;时间距离计算模块,设置为计算所述每个峰值点与寻找到的所述其他峰值点中每个峰值点在时间上的第一距离,将所述第一距离作为每个峰值点的特征信息。The time neighborhood search module is set to search for other peak points located in the neighborhood of each peak point on the spectrogram under the dimension of time; the time distance calculation module is set to calculate each peak point and find the The first distance in time of each peak point in the other peak points, and the first distance is used as the characteristic information of each peak point.
在本申请的一个实施例中,所述频率距离测量模块包括:In an embodiment of the present application, the frequency distance measurement module includes:
频率邻域搜索模块,设置为寻找在频率的维度下、所述频谱图上的位于每个峰值点的邻域内的其他峰值点;频率距离计算模块,设置为计算所述每个峰值点与寻找到的所述其他峰值点中每个峰值点在频率上的第二距离,将所述第二距离作为所述每个峰值点的特征信息。The frequency neighborhood search module is set to search for other peak points located in the neighborhood of each peak point on the spectrogram under the dimension of frequency; the frequency distance calculation module is set to calculate each peak point and find The second distance in frequency of each peak point in the other peak points, and the second distance is used as the characteristic information of each peak point.
在本申请的一个实施例中,所述哈希值计算模块包括:In an embodiment of the present application, the hash value calculation module includes:
二进制转换模块,设置为将每个峰值点的所述频率值、所述第一距离与所述第二距离均转换为二进制格式;拼接模块,设置为若转换完成,则将每个峰值点的所述频率值、所述第一距离与所述第二距离进行拼接,并将拼接结果作为所述目标音频数据的一个目标指纹数据。The binary conversion module is configured to convert the frequency value, the first distance and the second distance of each peak point into binary format; the splicing module is configured to convert the frequency value of each peak point into binary format if the conversion is completed. The frequency value, the first distance and the second distance are spliced, and the splicing result is used as a target fingerprint data of the target audio data.
在本申请的一个实施例中,所述指纹数据匹配模块302包括:In an embodiment of the present application, the fingerprint data matching module 302 includes:
相似度计算模块,设置为计算每一目标指纹数据与第一音频指纹库中的全部参考指纹数据的相似度以及计算每一目标指纹数据与第二音频指纹库中的参考指纹数据的相似度;连续匹配模块,设置为若全部目标指纹数据中的连续的n个目标指纹数据分别与一个音频指纹库中的连续的n个参考指纹数据之间的相似度大于预设的阈值,则确定所述目标音频数据的目标指纹数据与所述一个音频指纹库中的参考指纹数据匹配成功,所述一个音频指纹库包括所述第一音频指纹库或所述第二音频指纹库,n为正整数。Similarity calculation module, is set to calculate the similarity of each target fingerprint data and all the reference fingerprint data in the first audio fingerprint database and calculate the similarity of each target fingerprint data and the reference fingerprint data in the second audio fingerprint database; The continuous matching module is set to if the similarity between the continuous n target fingerprint data in all target fingerprint data and the continuous n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, then determine the The target fingerprint data of the target audio data is successfully matched with the reference fingerprint data in the one audio fingerprint database, and the one audio fingerprint database includes the first audio fingerprint database or the second audio fingerprint database, and n is a positive integer.
在本申请的一个实施例中,所述第一更新模块304包括:In an embodiment of the present application, the first update module 304 includes:
第一键值对生成模块,设置为以每个目标指纹数据为键,所述目标音频数据的标识、所述每个目标指纹数据所属音频信号的序号为值,生成键值对,所述每个目标指纹数据所属音频信号属于所述目标音频数据中的一帧信号;第一键值对存储模块,设置为将所述键值对存储至所述第一音频指纹库中,并将所述键值对作为所述第一音频指纹库中新的参考指纹数据。The first key-value pair generation module is set to take each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and each The audio signals to which the target fingerprint data belongs belong to a frame of signals in the target audio data; the first key-value pair storage module is configured to store the key-value pair in the first audio fingerprint database, and store the key-value pair in the first audio fingerprint database. The key-value pair is used as the new reference fingerprint data in the first audio fingerprint database.
在本申请的一个实施例中,所述第二更新模块305包括:In an embodiment of the present application, the second update module 305 includes:
第二键值对生成模块,设置为以每个目标指纹数据为键,所述目标音频数据的标识、所述每个目标指纹数据所属音频信号的序号为值,生成键值对,所述每个目标指纹数据所属音频信号属于所述目标音频数据中的一帧信号;第二键值对存储模块,设置为将所述键值对存储至所述第二音频指纹库中,并将所述键值对作为所述第二音频指纹库中新的参考指纹数据。The second key-value pair generation module is configured to use each target fingerprint data as a key, the identifier of the target audio data, and the serial number of the audio signal to which each target fingerprint data belongs as a value, to generate a key-value pair, each The audio signals to which the target fingerprint data belongs belong to a frame of signals in the target audio data; the second key-value pair storage module is configured to store the key-value pairs in the second audio fingerprint database, and store the key-value pair in the second audio fingerprint database. The key-value pair is used as the new reference fingerprint data in the second audio fingerprint database.
在本申请的一个实施例中,还包括:In an embodiment of the present application, it also includes:
簇生成模块,设置为将所述目标音频数据作为新的参考音频数据,对所述新的参考音频数据生成新的簇。The cluster generation module is configured to use the target audio data as new reference audio data, and generate a new cluster for the new reference audio data.
在本申请的一个实施例中,还包括:In an embodiment of the present application, it also includes:
第一簇添加模块,设置为若所述目标指纹数据与所述第一音频指纹库中的参考指纹数据匹配成功,则将所述目标音频数据添加至参考音频数据所属的簇中,所述第一音频指纹库中的参考指纹数据属于所述参考音频数据;第二簇添加模块,设置为若所述目标指纹数据与所述第二音频指纹库中的参考指纹数据匹配成功,则将所述目标音频数据添加至参考音频数据所属的簇中,所述第二音频指纹库中的参考指纹数据属于所述参考音频数据。The first cluster adding module is set to if the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, then the target audio data is added to the cluster to which the reference audio data belongs, and the first audio data is added to the cluster to which the reference audio data belongs. The reference fingerprint data in an audio fingerprint database belongs to the reference audio data; a second cluster adding module is configured to add the target fingerprint data to the reference fingerprint data in the second audio fingerprint database if the target fingerprint data matches successfully with the reference fingerprint data in the second audio fingerprint database. The target audio data is added to the cluster to which the reference audio data belongs, and the reference fingerprint data in the second audio fingerprint database belongs to the reference audio data.
在本申请的一个实施例中,还包括:In an embodiment of the present application, it also includes:
生存时间设置模块,设置为对所述第一音频指纹库和/或所述第二音频指纹库中的参考指纹数据设置生存时间;生存时间衰减模块,设置为将所述生存时间进行衰减;生存时间增加模块,设置为若所述第一音频指纹库或所述第二音频指纹库中的参考指纹数据与所述目标指纹数据匹配成功,则增加所述生存时间;指纹数据删除模块,设置为若所述生存时间衰减完毕时,从所述第一音频指纹库或所述第二音频指纹库中删除所述参考指纹数据。A survival time setting module, configured to set a survival time for the reference fingerprint data in the first audio fingerprint database and/or the second audio fingerprint database; a survival time decay module, configured to attenuate the survival time; survival time A time increase module, configured to increase the survival time if the reference fingerprint data in the first audio fingerprint database or the second audio fingerprint database is successfully matched with the target fingerprint data; a fingerprint data deletion module, set to If the decay of the time-to-live is completed, the reference fingerprint data is deleted from the first audio fingerprint database or the second audio fingerprint database.
在本申请的一个实施例中,还包括:In an embodiment of the present application, it also includes:
指标统计模块,设置为若所述第二音频指纹库中的参考指纹数据与所述目标指纹数据匹配成功,则对所述参考指纹数据统计匹配成功的指标;指纹数据移库模块,设置为若所述指标满足预设的转库条件,则将所述参考指纹数据从所述第二音频指纹库转移至所述第一音频指纹库。The index statistics module is set to, if the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, then the reference fingerprint data is statistically matched to the index of the successful matching; the fingerprint data database moving module is set to if If the index satisfies the preset database transfer condition, the reference fingerprint data is transferred from the second audio fingerprint database to the first audio fingerprint database.
本申请实施例所提供的音频指纹的处理装置可执行本申请任意实施例所提供的音频指纹的处理方法,具备执行方法相应的功能模块。The audio fingerprint processing apparatus provided by the embodiment of the present application can execute the audio fingerprint processing method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
实施例四Embodiment 4
图4为本申请实施例四提供的一种计算机设备的结构示意图。图4示出了适于用来实现本申请实施方式的示例性计算机设备12的框图。图4显示的计算机设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。FIG. 4 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application. FIG. 4 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present application. The computer device 12 shown in FIG. 4 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
如图4所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 4, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线 结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(MicroChannel Architecture,MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。 Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, MicroChannel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) , VESA) local bus and Peripheral Component Interconnect (PCI) bus.
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以设置为读写不可移动的、非易失性磁介质(图4未显示,通常称为“硬盘驱动器”)。尽管图4中未示出,可以提供设置为对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如光盘只读存储器(Compact Disc Read-Only Memory,CD-ROM),(Digital Video Disc Read-Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请多个实施例的功能。 System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be configured to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in Figure 4, a magnetic disk drive configured to read and write to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg Compact Disc Read-Only Memory) may be provided Read-Only Memory, CD-ROM), (Digital Video Disc Read-Only Memory, DVD-ROM) or other optical media) CD-ROM drive for reading and writing. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. The memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微 代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。 Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through an Input/Output (I/O) interface 22 . Also, computer device 12 may communicate with one or more networks (eg, Local Area Network (LAN), Wide Area Network (WAN), and/or public networks such as the Internet) through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems.
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本申请实施例所提供的音频指纹的处理方法。The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, implementing the audio fingerprint processing method provided by the embodiments of the present application.
实施例五Embodiment 5
本申请实施例五还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述音频指纹的处理方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The fifth embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above-mentioned audio fingerprint processing method can be achieved, and the same can be achieved. The technical effect, in order to avoid repetition, will not be repeated here.
其中,计算机可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Erasable Programmable read only memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Wherein, the computer-readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (Read Only Memory) , ROM), Electrically Erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or the above any suitable combination. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

Claims (14)

  1. 一种音频指纹的处理方法,包括:A method for processing audio fingerprints, comprising:
    对目标音频数据生成目标指纹数据;generating target fingerprint data for the target audio data;
    将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;Matching the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;
    在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二音频指纹库中的参考指纹数据均匹配失败的情况下,调用音乐查询服务接口查询所述目标音频数据的版权信息;If the target fingerprint data fails to match the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database, call the music query service interface to query the target audio data copyright information;
    在已查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;In the case that the copyright information of the target audio data has been queried, the target fingerprint data is stored in the first audio fingerprint database, so that the target fingerprint data is used as a new content in the first audio fingerprint database Referring to the fingerprint data, and recording the copyright information of the target audio data;
    在未查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。In the case where the copyright information of the target audio data is not queried, the target fingerprint data is stored in the second audio fingerprint database, so that the target fingerprint data is used as a new content in the second audio fingerprint database Reference fingerprint data.
  2. 根据权利要求1所述的方法,其中,所述对目标音频数据生成目标指纹数据,包括:The method according to claim 1, wherein the generating target fingerprint data for the target audio data comprises:
    将所述目标音频数据划分为多帧音频信号;dividing the target audio data into multiple frames of audio signals;
    将所述多帧音频信号转换为频谱图;converting the multi-frame audio signal into a spectrogram;
    在所述频谱图上遍历表示峰值的多个数据点,将每个数据点作为一个峰值点;Traverse a plurality of data points representing a peak value on the spectrogram, and use each data point as a peak point;
    提取每个峰值点的特征信息;Extract the characteristic information of each peak point;
    对每个峰值点的特征信息计算哈希值,将每个峰值点对应的哈希值作为所述目标音频数据的一个目标指纹数据。A hash value is calculated for the characteristic information of each peak point, and the hash value corresponding to each peak point is used as a target fingerprint data of the target audio data.
  3. 根据权利要求2所述的方法,其中,所述提取每个峰值点的特征信息, 包括:The method according to claim 2, wherein the extracting characteristic information of each peak point comprises:
    查询每个峰值点的频率值,将所述频率值作为所述每个峰值点的特征信息;query the frequency value of each peak point, and use the frequency value as the characteristic information of each peak point;
    测量每个峰值点与其他峰值点中每个峰值点在时间上的第一距离,将所述第一距离作为所述每个峰值点的特征信息;Measure the first distance in time between each peak point and each of the other peak points, and use the first distance as the characteristic information of each peak point;
    测量每个峰值点与其他峰值点中每个峰值点在频率上的第二距离,将所述第二距离作为所述每个峰值点的特征信息。A second distance in frequency between each peak point and each of the other peak points is measured, and the second distance is used as characteristic information of each peak point.
  4. 根据权利要求3所述的方法,其中,所述测量每个峰值点与其他峰值点中每个峰值点在时间上的第一距离,将所述第一距离作为所述每个峰值点的特征信息,包括:The method according to claim 3, wherein the first distance in time between each peak point and each of the other peak points is measured, and the first distance is used as a feature of each peak point information, including:
    寻找在时间的维度下、所述频谱图上的位于每个峰值点的邻域内的其他峰值点;Find other peak points in the neighborhood of each peak point on the spectrogram under the dimension of time;
    计算所述每个峰值点与寻找到的所述其他峰值点中每个峰值点在时间上的第一距离,将所述第一距离作为所述每个峰值点的特征信息;Calculate the first distance in time between each peak point and each of the other peak points found, and use the first distance as the characteristic information of each peak point;
    所述测量每个峰值点与其他峰值点中每个峰值点在频率上的第二距离,将所述第二距离作为所述每个峰值点的特征信息,包括:The measurement of the second distance in frequency between each peak point and each of the other peak points, and the second distance as the characteristic information of each peak point, including:
    寻找在频率的维度下、所述频谱图上的位于每个峰值点的邻域内的其他峰值点;Find other peak points in the neighborhood of each peak point on the spectrogram under the dimension of frequency;
    计算所述每个峰值点与寻找到的所述其他峰值点中每个峰值点在频率上的第二距离,将所述第二距离作为所述每个峰值点的特征信息。A second distance in frequency between each peak point and each of the other peak points found is calculated, and the second distance is used as characteristic information of each peak point.
  5. 根据权利要求3所述的方法,其中,所述对每个峰值点的特征信息计算哈希值,将每个峰值点对应的哈希值作为所述目标音频数据的一个目标指纹数据,包括:The method according to claim 3, wherein, calculating a hash value for the characteristic information of each peak point, and using the hash value corresponding to each peak point as a target fingerprint data of the target audio data, comprising:
    将每个峰值点的所述频率值、所述第一距离与所述第二距离均转换为二进 制格式;Converting the frequency value, the first distance and the second distance of each peak point into binary format;
    在转换完成的情况下,将所述每个峰值点的所述频率值、所述第一距离与所述第二距离进行拼接,并将拼接结果作为所述目标音频数据的一个目标指纹数据。When the conversion is completed, the frequency value, the first distance and the second distance of each peak point are spliced, and the splicing result is used as a target fingerprint data of the target audio data.
  6. 根据权利要求2所述的方法,其中,所述第一音频指纹库中包括多个参考指纹数据,所述第二音频指纹库中包括多个参考指纹数据;The method according to claim 2, wherein the first audio fingerprint database includes a plurality of reference fingerprint data, and the second audio fingerprint database includes a plurality of reference fingerprint data;
    所述将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配,包括:The matching of the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database includes:
    计算每一所述目标指纹数据与第一音频指纹库中的全部参考指纹数据的相似度以及计算每一所述目标指纹数据与第二音频指纹库中的全部参考指纹数据的相似度;Calculate the similarity between each of the target fingerprint data and all the reference fingerprint data in the first audio fingerprint database and calculate the similarity between each of the target fingerprint data and all the reference fingerprint data in the second audio fingerprint database;
    在全部目标指纹数据中的连续的n个目标指纹数据分别与一个音频指纹库中连续的n个参考指纹数据之间的相似度均大于预设的阈值的情况下,确定所述目标音频数据的目标指纹数据与所述一个音频指纹库中的参考指纹数据匹配成功,所述一个音频指纹库包括所述第一音频指纹库或所述第二音频指纹库,n为正整数。In the case that the similarity between the consecutive n target fingerprint data in all the target fingerprint data and the consecutive n reference fingerprint data in an audio fingerprint database is greater than a preset threshold, determine the similarity of the target audio data. The target fingerprint data is successfully matched with the reference fingerprint data in the one audio fingerprint database, where the one audio fingerprint database includes the first audio fingerprint database or the second audio fingerprint database, and n is a positive integer.
  7. 根据权利要求2所述的方法,其中,所述将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,包括:The method of claim 2, wherein the storing the target fingerprint data into the first audio fingerprint database is to use the target fingerprint data as new reference fingerprint data in the first audio fingerprint database ,include:
    以每个目标指纹数据为键,所述目标音频数据的标识、所述每个目标指纹数据所属音频信号的序号为值,生成键值对,所述每个目标指纹数据所属音频信号属于所述目标音频数据中的一帧信号;Taking each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and the audio signal to which each target fingerprint data belongs belongs to the A frame of signal in the target audio data;
    将所述键值对存储至所述第一音频指纹库中,并将所述键值对作为所述第 一音频指纹库中新的参考指纹数据;The key-value pair is stored in the first audio fingerprint library, and the key-value pair is used as new reference fingerprint data in the first audio fingerprint library;
    所述将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据,包括:The storing of the target fingerprint data in the second audio fingerprint database to use the target fingerprint data as new reference fingerprint data in the second audio fingerprint database includes:
    以每个目标指纹数据为键,所述目标音频数据的标识、所述每个目标指纹数据所属音频信号的序号为值,生成键值对,所述每个目标指纹数据所属音频信号属于所述目标音频数据中的一帧信号;Taking each target fingerprint data as a key, the identification of the target audio data, the sequence number of the audio signal to which each target fingerprint data belongs is a value, and a key-value pair is generated, and the audio signal to which each target fingerprint data belongs belongs to the A frame of signal in the target audio data;
    将所述键值对存储至所述第二音频指纹库中,并将所述键值对作为所述第二音频指纹库中新的参考指纹数据。The key-value pair is stored in the second audio fingerprint database, and the key-value pair is used as new reference fingerprint data in the second audio fingerprint database.
  8. 根据权利要求1-7任一项所述的方法,在所述将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据之后,还包括:The method according to any one of claims 1-7, in the storing of the target fingerprint data in the first audio fingerprint database, to use the target fingerprint data as a new one in the first audio fingerprint database After the reference fingerprint data, it also includes:
    将所述目标音频数据作为新的参考音频数据,对所述新的参考音频数据生成新的簇。Taking the target audio data as new reference audio data, a new cluster is generated for the new reference audio data.
  9. 根据权利要求1-7任一项所述的方法,还包括:The method according to any one of claims 1-7, further comprising:
    在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据匹配成功的情况下,将所述目标音频数据添加至参考音频数据所属的簇中,所述第一音频指纹库中的参考指纹数据属于所述参考音频数据;In the case that the target fingerprint data is successfully matched with the reference fingerprint data in the first audio fingerprint database, the target audio data is added to the cluster to which the reference audio data belongs. the reference fingerprint data belongs to the reference audio data;
    在所述目标指纹数据与所述第二音频指纹库中的参考指纹数据匹配成功的情况下,将所述目标音频数据添加至参考音频数据所属的簇中,所述第二音频指纹库中的参考指纹数据属于所述参考音频数据。In the case where the target fingerprint data is successfully matched with the reference fingerprint data in the second audio fingerprint database, the target audio data is added to the cluster to which the reference audio data belongs, and the target audio data is added to the cluster to which the reference audio data belongs. Reference fingerprint data belongs to the reference audio data.
  10. 根据权利要求1-7任一项所述的方法,还包括以下至少之一:The method according to any one of claims 1-7, further comprising at least one of the following:
    对所述第一音频指纹库中的参考指纹数据设置生存时间;将所述生存时间进行衰减;在所述第一音频指纹库中的参考指纹数据与所述目标指纹数据匹配 成功的情况下,则增加所述生存时间;在所述生存时间衰减完毕的情况下,从所述第一音频指纹库中删除所述参考指纹数据;Setting a survival time for the reference fingerprint data in the first audio fingerprint database; attenuating the survival time; in the case that the reference fingerprint data in the first audio fingerprint database is successfully matched with the target fingerprint data, then increase the time-to-live; when the decay of the time-to-live is completed, delete the reference fingerprint data from the first audio fingerprint database;
    对所述第二音频指纹库中的参考指纹数据设置生存时间;将所述生存时间进行衰减;在所述第二音频指纹库中的参考指纹数据与所述目标指纹数据匹配成功的情况下,增加所述生存时间;在所述生存时间衰减完毕的情况下,从所述第二音频指纹库删除所述参考指纹数据。Setting a survival time for the reference fingerprint data in the second audio fingerprint database; attenuating the survival time; in the case that the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, Increase the time-to-live; and delete the reference fingerprint data from the second audio fingerprint database when the decay of the time-to-live is completed.
  11. 根据权利要求1-7任一项所述的方法,还包括:The method according to any one of claims 1-7, further comprising:
    在所述第二音频指纹库中的参考指纹数据与所述目标指纹数据匹配成功的情况下,对所述参考指纹数据统计匹配成功的指标;In the case that the reference fingerprint data in the second audio fingerprint database is successfully matched with the target fingerprint data, count the indicators of successful matching of the reference fingerprint data;
    在所述指标满足预设的转库条件的情况下,将所述参考指纹数据从所述第二音频指纹库转移至所述第一音频指纹库。In the case that the index satisfies the preset database transfer condition, the reference fingerprint data is transferred from the second audio fingerprint database to the first audio fingerprint database.
  12. 一种音频指纹的处理装置,包括:An audio fingerprint processing device, comprising:
    指纹数据生成模块,设置为对目标音频数据生成目标指纹数据;A fingerprint data generation module, configured to generate target fingerprint data for the target audio data;
    指纹数据匹配模块,设置为将所述目标指纹数据与第一音频指纹库中的参考指纹数据和第二音频指纹库中的参考指纹数据进行匹配;A fingerprint data matching module, configured to match the target fingerprint data with the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database;
    接口查询模块,设置为在所述目标指纹数据与所述第一音频指纹库中的参考指纹数据和所述第二音频指纹库中的参考指纹数据均匹配失败的情况下,调用音乐查询服务接口查询所述目标音频数据的版权信息;The interface query module is configured to call the music query service interface when the target fingerprint data and the reference fingerprint data in the first audio fingerprint database and the reference fingerprint data in the second audio fingerprint database all fail to match query the copyright information of the target audio data;
    第一更新模块,设置为在已查询到所述目标音频数据的版权信息的情况下,将所述目标指纹数据存储至所述第一音频指纹库中以将所述目标指纹数据作为所述第一音频指纹库中新的参考指纹数据,并记录所述目标音频数据的版权信息;The first update module is configured to store the target fingerprint data in the first audio fingerprint database when the copyright information of the target audio data has been queried to use the target fingerprint data as the first New reference fingerprint data in an audio fingerprint database, and record the copyright information of the target audio data;
    第二更新模块,设置为在未查询到所述目标音频数据的版权信息的情况下, 将所述目标指纹数据存储至所述第二音频指纹库中以将所述目标指纹数据作为所述第二音频指纹库中新的参考指纹数据。The second update module is configured to store the target fingerprint data in the second audio fingerprint database under the condition that the copyright information of the target audio data is not queried, so as to use the target fingerprint data as the first 2. New reference fingerprint data in the audio fingerprint library.
  13. 一种计算机设备,包括:A computer device comprising:
    至少一个处理器;at least one processor;
    存储器,设置为存储一个或多个程序,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-11中任一项所述的音频指纹的处理方法。A memory configured to store one or more programs that, when executed by the at least one processor, cause the at least one processor to implement the audio fingerprinting of any one of claims 1-11 Approach.
  14. 一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1-11中任一项所述的音频指纹的处理方法。A computer-readable storage medium, storing a computer program on the computer-readable storage medium, the computer program implementing the audio fingerprint processing method according to any one of claims 1-11 when the computer program is executed by a processor.
PCT/CN2022/081680 2021-03-18 2022-03-18 Audio fingerprint processing method and apparatus, and computer device and storage medium WO2022194277A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110292844.8A CN112784100A (en) 2021-03-18 2021-03-18 Audio fingerprint processing method and device, computer equipment and storage medium
CN202110292844.8 2021-03-18

Publications (1)

Publication Number Publication Date
WO2022194277A1 true WO2022194277A1 (en) 2022-09-22

Family

ID=75762743

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081680 WO2022194277A1 (en) 2021-03-18 2022-03-18 Audio fingerprint processing method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112784100A (en)
WO (1) WO2022194277A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784100A (en) * 2021-03-18 2021-05-11 百果园技术(新加坡)有限公司 Audio fingerprint processing method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453333A (en) * 2008-10-16 2009-06-10 北京光线传媒有限公司 Copyright recognition method, apparatus and system for media file
US20120191231A1 (en) * 2010-05-04 2012-07-26 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in Data Stream by a Client Device
US20140012572A1 (en) * 2011-12-30 2014-01-09 Tilman Herberger System and method for content recognition in portable devices
US20160247512A1 (en) * 2014-11-21 2016-08-25 Thomson Licensing Method and apparatus for generating fingerprint of an audio signal
CN107967922A (en) * 2017-12-19 2018-04-27 成都嗨翻屋文化传播有限公司 A kind of music copyright recognition methods of feature based
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN112784100A (en) * 2021-03-18 2021-05-11 百果园技术(新加坡)有限公司 Audio fingerprint processing method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453333A (en) * 2008-10-16 2009-06-10 北京光线传媒有限公司 Copyright recognition method, apparatus and system for media file
US20120191231A1 (en) * 2010-05-04 2012-07-26 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in Data Stream by a Client Device
US20140012572A1 (en) * 2011-12-30 2014-01-09 Tilman Herberger System and method for content recognition in portable devices
US20160247512A1 (en) * 2014-11-21 2016-08-25 Thomson Licensing Method and apparatus for generating fingerprint of an audio signal
CN107967922A (en) * 2017-12-19 2018-04-27 成都嗨翻屋文化传播有限公司 A kind of music copyright recognition methods of feature based
CN110047515A (en) * 2019-04-04 2019-07-23 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio identification methods, device, equipment and storage medium
CN112784100A (en) * 2021-03-18 2021-05-11 百果园技术(新加坡)有限公司 Audio fingerprint processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112784100A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
Haitsma et al. A highly robust audio fingerprinting system with an efficient search strategy
US20200257722A1 (en) Method and apparatus for retrieving audio file, server, and computer-readable storage medium
Haitsma et al. A highly robust audio fingerprinting system.
Cano et al. Robust sound modeling for song detection in broadcast audio
EP3508986B1 (en) Music cover identification for search, compliance, and licensing
US8706276B2 (en) Systems, methods, and media for identifying matching audio
US7031921B2 (en) System for monitoring audio content available over a network
JP5907511B2 (en) System and method for audio media recognition
US20140280304A1 (en) Matching versions of a known song to an unknown song
JP2004536348A (en) Automatic recording identification
WO2014137668A1 (en) Associating audio tracks of an album with video content
CN108447501A (en) Pirate video detection method and system based on audio word under a kind of cloud storage environment
EP3945435A1 (en) Dynamic identification of unknown media
WO2022194277A1 (en) Audio fingerprint processing method and apparatus, and computer device and storage medium
JP4267463B2 (en) Method for identifying audio content, method and system for forming a feature for identifying a portion of a recording of an audio signal, a method for determining whether an audio stream includes at least a portion of a known recording of an audio signal, a computer program , A system for identifying the recording of audio signals
US20220238087A1 (en) Methods and systems for determining compact semantic representations of digital audio signals
WO2022161291A1 (en) Audio search method and apparatus, computer device, and storage medium
Kekre et al. A review of audio fingerprinting and comparison of algorithms
Li et al. Low-order auditory Zernike moment: a novel approach for robust music identification in the compressed domain
KR101002732B1 (en) Online digital contents management system
Porter Evaluating musical fingerprinting systems
Hellmuth et al. Advanced audio identification using MPEG-7 content description
Qian et al. A novel algorithm for audio information retrieval based on audio fingerprint
CN117807564A (en) Infringement identification method, device, equipment and medium for audio data
Haitsma et al. A New Technology To Identify Music

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770629

Country of ref document: EP

Kind code of ref document: A1