CN110399521B

CN110399521B - Music retrieval method, system, computer device and computer readable storage medium

Info

Publication number: CN110399521B
Application number: CN201910541222.7A
Authority: CN
Inventors: 张爽; 王义文; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2023-06-06
Anticipated expiration: 2039-06-21
Also published as: CN110399521A

Abstract

The embodiment of the invention provides a music retrieval method, which comprises the following steps: pre-configuring an audio retrieval database, wherein the audio retrieval database comprises a plurality of piano window image data of a plurality of pre-stored audios; analyzing a target spectrogram of the audio to be detected; calculating a target chromaticity vector of octave of the audio to be detected according to the target spectrogram; generating corresponding target piano window image data according to the target chromaticity vector; calculating cross-correlation coefficients of the target piano window image data of the audio to be detected and piano window image data of each of the plurality of pre-stored audios to obtain a plurality of cross-correlation coefficients; and selecting target audio matched with the audio to be detected according to the cross-correlation coefficients. The embodiment of the invention provides a music retrieval system, a computer device and a storage medium. According to the embodiment of the invention, through calculating the cross correlation coefficient, the retrieval efficiency is improved, and the storage space is saved.

Description

Music retrieval method, system, computer device and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a music retrieval method, a system, computer equipment and a computer readable storage medium.

Background

With the advent of the internet age, big data has become a popular topic. In the field of music retrieval, how to realize quick retrieval is a very valuable problem facing millions of levels of music data.

Most existing search methods read waveform data from music and compare them with waveform data stored in a database to determine songs. However, the continuous accumulation of sound data in the database can result in the volume of data becoming enormous, time consuming to retrieve, and space consuming.

Disclosure of Invention

In view of the foregoing, an object of an embodiment of the present invention is to provide a music retrieval method, a system, a computer device, and a computer-readable storage medium, which improve retrieval efficiency and save storage space by performing calculation of cross correlation coefficients on piano window image data.

In order to achieve the above object, an embodiment of the present invention provides a music retrieval method, including:

pre-configuring an audio retrieval database, wherein the audio retrieval database comprises a plurality of piano window image data of a plurality of pre-stored audios;

analyzing a target spectrogram of the audio to be detected;

calculating a target chromaticity vector of octave of the audio to be detected according to the target spectrogram;

generating corresponding target piano window image data according to the target chromaticity vector;

calculating cross-correlation coefficients of the target piano window image data of the audio to be detected and piano window image data of each of the plurality of pre-stored audios to obtain a plurality of cross-correlation coefficients;

and selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients.

Further, the step of pre-configuring the audio retrieval database includes:

analyzing a plurality of spectrograms of a plurality of pre-stored audios;

calculating the chromaticity vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chromaticity vectors of the plurality of pre-stored audio;

clustering the plurality of pre-stored audio according to the plurality of chrominance vectors: randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center; iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification; updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers; judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the clustering center by taking the new clustering center generated after the iteration as a center;

and generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database.

Further, the step of generating corresponding target piano window image data according to the target chromaticity vector includes:

acquiring a clustering center of the target chromaticity vector of the audio to be detected; a kind of electronic device with high-pressure air-conditioning system

And generating corresponding target piano window image data according to the clustering center of the audio to be detected.

Further, the calculation formula of the target chromaticity vector is as follows:

wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h ₁ Power representing a target spectrogram and time of the audio to be detected; i represents 8 musical intervals included in the octave, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.

Further, the calculation formula of the cross-correlation coefficient is as follows, including:

wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R _fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n).

Further, the step of selecting the target audio matching the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients includes:

selecting a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value from the plurality of pre-stored audios according to the plurality of cross-correlation coefficients;

performing feature matching on the selected plurality of pre-stored audios according to dynamic time warping;

and selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the characteristic matching result.

Further, performing feature matching on the selected plurality of pre-stored audio according to dynamic time warping, including:

arranging the piano window image data of a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value and the piano window image data of the audios to be tested in a lattice form;

and calculating the Manhattan distance between the piano window image data of each pre-stored audio and the piano window image data of the audio to be tested by adopting a recursive formula of the Manhattan distance, wherein the Manhattan distance is used for representing a feature matching result between the audio to be tested and the corresponding preset audio.

To achieve the above object, an embodiment of the present invention further provides a music retrieval system, including:

a database creating module for pre-configuring an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audio;

the analysis module is used for analyzing a target spectrogram of the audio to be detected;

the first calculation module is used for calculating a target chromaticity vector of the octave of the audio to be detected according to the target spectrogram;

the processing module is used for generating corresponding target piano window image data according to the target chromaticity vector;

the second calculation module is used for calculating the cross-correlation coefficient of the target piano window image data of the audio to be detected and the piano window image data of each of the plurality of pre-stored audios so as to obtain a plurality of cross-correlation coefficients;

and the matching module is used for selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients.

To achieve the above object, an embodiment of the present invention further provides a computer device, including a memory, a processor, on which a music retrieval system executable on the processor is stored, which when executed by the processor implements the steps of the music retrieval method as described above.

To achieve the above object, an embodiment of the present invention also provides a computer-readable storage medium having stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the music retrieval method as described above.

According to the music retrieval method, the system, the computer equipment and the computer readable storage medium, the chromaticity vector is used for calculation, when the music retrieval is performed, the cross correlation coefficient of the piano window image of the audio to be detected and the pre-stored audio is calculated, then the feature matching is performed, the pre-stored audio which is successfully matched is output as a retrieval result, and the retrieval efficiency is improved. And a large amount of prestored audios can be subjected to cluster analysis, and the data are stored in a piano window form, so that the storage space is saved.

Drawings

Fig. 1 is a flowchart of a music retrieval method according to a first embodiment of the present invention.

Fig. 2 is a flowchart of step S100 according to an embodiment of the present invention.

Fig. 3 is a flowchart of step S1003 according to an embodiment of the present invention.

Fig. 4 is a flowchart of step S110 according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a program module of a music retrieval system according to a second embodiment of the present invention.

Fig. 6 is a schematic diagram of a hardware structure of a third embodiment of the computer device of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

Example 1

Referring to fig. 1, a flowchart of steps of a music retrieval method according to a first embodiment of the present invention is shown. It will be appreciated that the flow charts in the method embodiments are not intended to limit the order in which the steps are performed. An exemplary description will be made below with the server as an execution subject. Specifically, the following is described.

Step S100, an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audios is pre-configured.

For example, referring to fig. 2, the step S100 may further include:

in step S1001, a plurality of spectrograms of a plurality of pre-stored audios are analyzed.

Specifically, taking a pre-stored audio as an example, converting the pre-stored audio signal into a spectrogram through fourier transformation, and taking a frequency value as a horizontal axis and an amplitude as a vertical axis. The conversion process is as follows:

the function of the pre-stored audio frequency and time t is s ₂ (t) selecting a window with time t as the center and h as the time window ₂ (t) the longer the window of the time window, the longer the signal is intercepted.

At this time s ₂ (t) becomes:

s _t2 (τ)＝s ₂ (τ)h ₂ (τ-t)。

from the transformed s _t2 (t) fourier transform to obtain frequency distribution:

the energy spectrum density represents the frequency change of the audio to be measured in time t, and is:

from s _t2 (t) and energy spectral Density P _SP2 (t) calculating a time window h ₂ (t) and s ₂ The frequency distribution relationship of (t) is as follows:

S _t2 (ω)＝e ^-jωt s _ω2 (t)。

the horizontal axis of the spectrogram of the pre-stored audio is H ₂ (omega), the vertical axis S _t2 (ω)。

In step S1002, the chroma vector of each pre-stored audio is calculated according to the octave and each spectrogram, so as to obtain a plurality of chroma vectors of the plurality of pre-stored audio.

Step S1003, clustering the plurality of pre-stored audio according to the plurality of chrominance vectors.

Step S1004, generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database.

For example, referring to fig. 3, step S1003 may further include:

in step S1003A, n chroma vectors of a plurality of pre-stored audio are randomly selected as cluster centers.

Step S1003B, iterating: the distances from the chroma vectors except for the n chroma vectors to the n cluster centers are calculated respectively, and then the cluster center closest to the chroma vectors is selected as the belonging classification.

Step S103C, updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; i.e. the average of the chrominance vectors in the classification of n cluster centers is calculated.

Step S103D, judging whether the new cluster center changes after each iteration, and stopping the iteration if the new cluster center does not change; otherwise, continuing iteration and updating the cluster center by taking the new cluster center generated after the iteration as a center.

Thus, a clustering center model of the pre-stored audio is established.

Specifically, the clustering result is related to the spectrogram, the vertical axis on the piano window image represents the pitch sum, the horizontal axis represents the time, and the time of sound pronunciation is input along the time series. Therefore, the vertical axis pitch of the piano window image can be found from the amplitude of the spectrogram, and the horizontal axis time can be found from the frequency. Binarizing the piano window image of each pre-stored audio by using image processing to obtain piano window image data, and storing the piano window image data in a database.

Step S102, analyzing a target spectrogram of the audio to be detected.

And converting the audio signal to be detected into the target spectrogram through Fourier transformation, wherein a frequency value is taken as a horizontal axis, and an amplitude is taken as a vertical axis. The conversion process is as follows:

the function of the audio to be detected and the time t is s ₁ (t) selecting a window with time t as the center and h as the time window ₁ (t) the longer the window length, the longer the signal is intercepted.

At this time s ₁ (t) becomes:

s _t1 (τ)＝s ₁ (τ)h ₁ (τ-t)。

from the transformed s _t1 (t) obtaining a frequency fractionThe fourier transform of the cloth is:

from s _t1 (t) and energy spectral Density P _SP1 (t) calculating a time window h ₁ (t) and s ₁ The frequency distribution relationship of (t) is as follows:

S _t1 (ω)＝e ^-jωt s _ω1 (t)。

the transverse axis of the audio spectrogram to be measured is H ₁ (omega), the vertical axis S _t1 (ω)。

Step S104, calculating a target chromaticity vector of the octave of the audio to be detected according to the target spectrogram.

Illustratively, the calculation formula of the target chromaticity vector is as follows:

wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves, for example, twelve parts of an octave distance, each part being a half tone, two half tones being equal to a whole tone; t represents time; h ₁ Power representing a target spectrogram and time of the audio to be detected; i represents 8 intervals included in the octave, namely 12345671,0 is less than or equal to I is less than or equal to 7; c (k, t) represents the target chrominance vector.

And step S106, corresponding target piano window image data is generated according to the target chromaticity vector.

Specifically, a clustering center of the target chromaticity vector of the audio to be detected is input into the clustering center model to obtain a clustering center category of the clustering center of the target chromaticity vector of the audio to be detected in the clustering center model.

And forming a corresponding target piano window image according to the clustering center of the audio to be detected and the target spectrogram, wherein the vertical axis on the target piano window image represents the pitch sum, the horizontal axis represents the time, and the time of sound pronunciation is input along the time sequence. Therefore, the vertical axis pitch of the target piano window image can be found from the amplitude of the target spectrogram, and the horizontal axis time can be found from the frequency. The target piano window image of the audio to be measured is binarized using image processing to become target piano window image data.

Step S108, calculating the cross-correlation coefficient of the target piano window image data of the audio to be tested and the piano window image data of each of the plurality of pre-stored audios so as to obtain a plurality of cross-correlation coefficients.

The calculation formula of the cross-correlation coefficient is as follows, which comprises:

wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R _fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n); the larger the cross-correlation coefficient, the higher the matching degree of the two.

Step S110, selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients.

Illustratively, referring to FIG. 4, step S110 includes:

step S110A, selecting a plurality of pre-stored audio frequencies with cross-correlation coefficients greater than a preset threshold from the plurality of pre-stored audio frequencies according to the plurality of cross-correlation coefficients.

Step S110B, performing feature matching on the selected plurality of pre-stored audios according to the dynamic time warping.

Step S110C, selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the feature matching result.

Illustratively, step S110B further includes:

Illustratively, after the audio test to be tested is completed, the piano window image data is stored in a database and used as a retrieval template for next matching.

Specifically, dynamic time warping (Dynamic Time warp, DTW) is a method for calculating the degree of matching of two time series by nonlinear expansion matching of time series patterns.

The two k-dimensional signals are as follows:

the audio X to be detected comprises M samples, and the pre-stored audio Y comprises N samples. Given the distance dmn (X, Y) between the mth sample of the audio X to be measured and the nth sample of the pre-stored audio Y, this distance is stretched until the cumulative value of the distance scale from the start point to dmn (X, Y) becomes the minimum value. First, the possible values of dmn (X, Y) are arranged in lattice form. Next, the minimum distance is calculated using the following formula.

d＝∑d _mn (X，Y)

m∈ix

n∈iy，

And (3) performing distance calculation by adopting a recursion formula based on Manhattan distance [ d (i, j) = |X1-X2|+|Y 1-Y2|]. The recursive formula provides a diagonal limiting path to prevent extreme stretching and shrinking, and matching windows for limiting the cumulative distance calculation area. A reduction in speech recognition performance and computation is achieved.

Example two

With continued reference to fig. 5, a schematic diagram of a program module of a music retrieval system according to a second embodiment of the present invention is shown. In this embodiment, the music retrieval system 20 may include or be divided into one or more program modules stored in a storage medium and executed by one or more processors to accomplish the present invention and may implement the music retrieval method described above. Program modules in accordance with the embodiments of the present invention are directed to a series of computer program instruction segments capable of performing particular functions, which are more suitable than the program itself for describing the execution of the music retrieval system 20 in a storage medium. The following description will specifically describe functions of each program module of the present embodiment:

a database module 200 is created for pre-configuring an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audio.

Illustratively, the step of pre-configuring the audio retrieval database includes:

a plurality of spectrograms of a plurality of pre-stored audios are analyzed.

And calculating the chrominance vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chrominance vectors of the pre-stored audio.

And clustering the plurality of pre-stored audios according to the plurality of chrominance vectors.

Exemplary, the step of clustering the target chromaticity vector of the audio to be detected includes:

randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center;

iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification;

updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers;

judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the cluster center by taking the new cluster center generated after the iteration as a center.

Thus, a clustering center model of the pre-stored audio is established.

Specifically, a piano window image of each pre-stored audio is generated according to the clustering center of the pre-stored audio and the spectrogram of the pre-stored audio, the vertical axis on the piano window image represents pitch and the horizontal axis represents time, and the time of sound pronunciation is input along the time sequence. Therefore, the vertical axis pitch of the piano window image can be found from the amplitude of the spectrogram, and the horizontal axis time can be found from the frequency. Binarizing the piano window image of each pre-stored audio by using image processing to obtain piano window image data, and storing the piano window image data in a database.

The analysis module 201 is configured to analyze a target spectrogram of the audio to be detected.

The audio signal to be measured is converted into the target spectrogram through fourier transformation, and the frequency value is taken as a horizontal axis and the amplitude is taken as a vertical axis. The conversion process is as follows:

At this time s ₁ (t) becomes:

s _t1 (τ)＝s ₁ (τ)h ₁ (τ-t)。

from the transformed s _t1 (t) fourier transform to obtain frequency distribution:

S _t1 (ω)＝e ^-jωt s _ω1 (t)。

A first calculation module 202, configured to calculate a target chroma vector of an octave of the audio to be measured according to the target spectrogram.

wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h ₁ Power representing a target spectrogram and time of the audio to be detected; i represents the eightThe interval of the degree comprises 8 intervals, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.

And the processing module 203 is used for generating corresponding target piano window image data according to the target chromaticity vector.

Specifically, a cluster center of the target chromaticity vector of the audio to be detected is input into the cluster center model, so that a cluster center category of the cluster center of the target chromaticity vector of the audio to be detected in the cluster center model is obtained.

The second calculating module 204 is configured to calculate a cross-correlation coefficient between the target piano window image data of the audio to be tested and the piano window image data of each of the plurality of pre-stored audios, so as to obtain a plurality of cross-correlation coefficients.

And the matching module 205 is configured to select, according to the plurality of cross-correlation coefficients, a target audio that matches the audio to be detected from the plurality of pre-stored audio.

Illustratively, the matching module 205 is further configured to:

and selecting a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value from the plurality of pre-stored audios according to the plurality of cross-correlation coefficients.

And performing feature matching on the selected plurality of pre-stored audios according to the dynamic time warping.

Illustratively, the matching module 205 is further configured to:

The two k-dimensional signals are as follows:

d＝∑d _mn (X，Y)

m∈ix

n∈iy，

Example III

Referring to fig. 6, a hardware architecture diagram of a computer device according to a third embodiment of the present invention is shown. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster made up of multiple servers), or the like. As shown in fig. 6, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a music retrieval system 20, which are communicatively connected to each other via a system bus. Wherein:

in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 2. Of course, the memory 21 may also include both internal storage units of the computer device 2 and external storage devices. In this embodiment, the memory 21 is typically used to store an operating system and various types of application software installed on the computer device 2, such as program codes of the music retrieval system 20 of the second embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.

The processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the music retrieval system 20, to implement the music retrieval method of the first embodiment.

The network interface 23 may comprise a wireless network interface or a wired network interface, which network interface 23 is typically used for establishing a communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network.

It is noted that fig. 6 only shows a computer device 2 having components 20-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.

In this embodiment, the music retrieval system 20 stored in the memory 21 may be further divided into one or more program modules, which are stored in the memory 21 and executed by one or more processors (the processor 22 in this embodiment) to complete the present invention.

For example, fig. 5 shows a schematic diagram of a program module for implementing the second embodiment of the music retrieval system 20, where the music retrieval system 20 may be divided into a database creation module 200, an analysis module 201, a first calculation module 202, a processing module 203, a second calculation module 204, and a matching module 205. Program modules in the present invention are understood to mean a series of computer program instruction segments capable of performing a specific function, more suitable than a program for describing the execution of the music retrieval system 20 in the computer device 2. The specific functions of the program modules 200-205 are described in detail in the second embodiment, and are not described herein.

Example IV

The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing the music retrieval system 20, and when executed by a processor, implements the music retrieval method of the first embodiment.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A music retrieval method, comprising:

analyzing a target spectrogram of the audio to be detected;

selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients;

a step of pre-configuring an audio retrieval database, comprising:

analyzing a plurality of spectrograms of a plurality of pre-stored audios;

generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database;

generating corresponding target piano window image data according to the target chromaticity vector, wherein the step comprises the following steps:

Generating corresponding target piano window image data according to the clustering center of the audio to be detected;

the calculation formula of the target chromaticity vector is as follows:

2. The music retrieval method according to claim 1, wherein the calculation formula of the cross-correlation coefficient is as follows, including:

3. The music retrieval method according to claim 1, wherein the step of selecting a target audio matching the audio to be detected from the plurality of pre-stored audios based on the plurality of cross-correlation coefficients includes:

4. A music retrieval method according to claim 3, wherein the step of feature matching the selected ones of the pre-stored audio according to dynamic time warping comprises:

5. A music retrieval system, comprising:

the matching module is used for selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients;

the database creation module is further used for analyzing a plurality of spectrograms of a plurality of pre-stored audios; calculating the chromaticity vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chromaticity vectors of the plurality of pre-stored audio; clustering the plurality of pre-stored audio according to the plurality of chrominance vectors: randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center; iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification; updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers; judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the clustering center by taking the new clustering center generated after the iteration as a center; generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database;

the processing module is also used for acquiring a clustering center where the target chromaticity vector of the audio to be detected is located; generating corresponding target piano window image data according to the clustering center of the audio to be detected;

the calculation formula of the target chromaticity vector is as follows:

/>

6. A computer device comprising a memory, a processor, the memory having stored thereon a music retrieval system operable on the processor, the music retrieval system when executed by the processor implementing the steps of the music retrieval method of any of claims 1-4.

7. A computer-readable storage medium, in which a computer program is stored, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the music retrieval method according to any one of claims 1-4.