CN110399521B - Music retrieval method, system, computer device and computer readable storage medium - Google Patents

Music retrieval method, system, computer device and computer readable storage medium Download PDF

Info

Publication number
CN110399521B
CN110399521B CN201910541222.7A CN201910541222A CN110399521B CN 110399521 B CN110399521 B CN 110399521B CN 201910541222 A CN201910541222 A CN 201910541222A CN 110399521 B CN110399521 B CN 110399521B
Authority
CN
China
Prior art keywords
audio
stored
target
image data
window image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910541222.7A
Other languages
Chinese (zh)
Other versions
CN110399521A (en
Inventor
张爽
王义文
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910541222.7A priority Critical patent/CN110399521B/en
Publication of CN110399521A publication Critical patent/CN110399521A/en
Application granted granted Critical
Publication of CN110399521B publication Critical patent/CN110399521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a music retrieval method, which comprises the following steps: pre-configuring an audio retrieval database, wherein the audio retrieval database comprises a plurality of piano window image data of a plurality of pre-stored audios; analyzing a target spectrogram of the audio to be detected; calculating a target chromaticity vector of octave of the audio to be detected according to the target spectrogram; generating corresponding target piano window image data according to the target chromaticity vector; calculating cross-correlation coefficients of the target piano window image data of the audio to be detected and piano window image data of each of the plurality of pre-stored audios to obtain a plurality of cross-correlation coefficients; and selecting target audio matched with the audio to be detected according to the cross-correlation coefficients. The embodiment of the invention provides a music retrieval system, a computer device and a storage medium. According to the embodiment of the invention, through calculating the cross correlation coefficient, the retrieval efficiency is improved, and the storage space is saved.

Description

Music retrieval method, system, computer device and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a music retrieval method, a system, computer equipment and a computer readable storage medium.
Background
With the advent of the internet age, big data has become a popular topic. In the field of music retrieval, how to realize quick retrieval is a very valuable problem facing millions of levels of music data.
Most existing search methods read waveform data from music and compare them with waveform data stored in a database to determine songs. However, the continuous accumulation of sound data in the database can result in the volume of data becoming enormous, time consuming to retrieve, and space consuming.
Disclosure of Invention
In view of the foregoing, an object of an embodiment of the present invention is to provide a music retrieval method, a system, a computer device, and a computer-readable storage medium, which improve retrieval efficiency and save storage space by performing calculation of cross correlation coefficients on piano window image data.
In order to achieve the above object, an embodiment of the present invention provides a music retrieval method, including:
pre-configuring an audio retrieval database, wherein the audio retrieval database comprises a plurality of piano window image data of a plurality of pre-stored audios;
analyzing a target spectrogram of the audio to be detected;
calculating a target chromaticity vector of octave of the audio to be detected according to the target spectrogram;
generating corresponding target piano window image data according to the target chromaticity vector;
calculating cross-correlation coefficients of the target piano window image data of the audio to be detected and piano window image data of each of the plurality of pre-stored audios to obtain a plurality of cross-correlation coefficients;
and selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients.
Further, the step of pre-configuring the audio retrieval database includes:
analyzing a plurality of spectrograms of a plurality of pre-stored audios;
calculating the chromaticity vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chromaticity vectors of the plurality of pre-stored audio;
clustering the plurality of pre-stored audio according to the plurality of chrominance vectors: randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center; iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification; updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers; judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the clustering center by taking the new clustering center generated after the iteration as a center;
and generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database.
Further, the step of generating corresponding target piano window image data according to the target chromaticity vector includes:
acquiring a clustering center of the target chromaticity vector of the audio to be detected; a kind of electronic device with high-pressure air-conditioning system
And generating corresponding target piano window image data according to the clustering center of the audio to be detected.
Further, the calculation formula of the target chromaticity vector is as follows:
Figure GDA0004213472020000021
wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h 1 Power representing a target spectrogram and time of the audio to be detected; i represents 8 musical intervals included in the octave, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.
Further, the calculation formula of the cross-correlation coefficient is as follows, including:
Figure GDA0004213472020000031
wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n).
Further, the step of selecting the target audio matching the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients includes:
selecting a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value from the plurality of pre-stored audios according to the plurality of cross-correlation coefficients;
performing feature matching on the selected plurality of pre-stored audios according to dynamic time warping;
and selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the characteristic matching result.
Further, performing feature matching on the selected plurality of pre-stored audio according to dynamic time warping, including:
arranging the piano window image data of a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value and the piano window image data of the audios to be tested in a lattice form;
and calculating the Manhattan distance between the piano window image data of each pre-stored audio and the piano window image data of the audio to be tested by adopting a recursive formula of the Manhattan distance, wherein the Manhattan distance is used for representing a feature matching result between the audio to be tested and the corresponding preset audio.
To achieve the above object, an embodiment of the present invention further provides a music retrieval system, including:
a database creating module for pre-configuring an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audio;
the analysis module is used for analyzing a target spectrogram of the audio to be detected;
the first calculation module is used for calculating a target chromaticity vector of the octave of the audio to be detected according to the target spectrogram;
the processing module is used for generating corresponding target piano window image data according to the target chromaticity vector;
the second calculation module is used for calculating the cross-correlation coefficient of the target piano window image data of the audio to be detected and the piano window image data of each of the plurality of pre-stored audios so as to obtain a plurality of cross-correlation coefficients;
and the matching module is used for selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients.
To achieve the above object, an embodiment of the present invention further provides a computer device, including a memory, a processor, on which a music retrieval system executable on the processor is stored, which when executed by the processor implements the steps of the music retrieval method as described above.
To achieve the above object, an embodiment of the present invention also provides a computer-readable storage medium having stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the music retrieval method as described above.
According to the music retrieval method, the system, the computer equipment and the computer readable storage medium, the chromaticity vector is used for calculation, when the music retrieval is performed, the cross correlation coefficient of the piano window image of the audio to be detected and the pre-stored audio is calculated, then the feature matching is performed, the pre-stored audio which is successfully matched is output as a retrieval result, and the retrieval efficiency is improved. And a large amount of prestored audios can be subjected to cluster analysis, and the data are stored in a piano window form, so that the storage space is saved.
Drawings
Fig. 1 is a flowchart of a music retrieval method according to a first embodiment of the present invention.
Fig. 2 is a flowchart of step S100 according to an embodiment of the present invention.
Fig. 3 is a flowchart of step S1003 according to an embodiment of the present invention.
Fig. 4 is a flowchart of step S110 according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a program module of a music retrieval system according to a second embodiment of the present invention.
Fig. 6 is a schematic diagram of a hardware structure of a third embodiment of the computer device of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Example 1
Referring to fig. 1, a flowchart of steps of a music retrieval method according to a first embodiment of the present invention is shown. It will be appreciated that the flow charts in the method embodiments are not intended to limit the order in which the steps are performed. An exemplary description will be made below with the server as an execution subject. Specifically, the following is described.
Step S100, an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audios is pre-configured.
For example, referring to fig. 2, the step S100 may further include:
in step S1001, a plurality of spectrograms of a plurality of pre-stored audios are analyzed.
Specifically, taking a pre-stored audio as an example, converting the pre-stored audio signal into a spectrogram through fourier transformation, and taking a frequency value as a horizontal axis and an amplitude as a vertical axis. The conversion process is as follows:
the function of the pre-stored audio frequency and time t is s 2 (t) selecting a window with time t as the center and h as the time window 2 (t) the longer the window of the time window, the longer the signal is intercepted.
At this time s 2 (t) becomes:
s t2 (τ)=s 2 (τ)h 2 (τ-t)。
from the transformed s t2 (t) fourier transform to obtain frequency distribution:
Figure GDA0004213472020000061
the energy spectrum density represents the frequency change of the audio to be measured in time t, and is:
Figure GDA0004213472020000062
from s t2 (t) and energy spectral Density P SP2 (t) calculating a time window h 2 (t) and s 2 The frequency distribution relationship of (t) is as follows:
Figure GDA0004213472020000063
S t2 (ω)=e -jωt s ω2 (t)。
the horizontal axis of the spectrogram of the pre-stored audio is H 2 (omega), the vertical axis S t2 (ω)。
In step S1002, the chroma vector of each pre-stored audio is calculated according to the octave and each spectrogram, so as to obtain a plurality of chroma vectors of the plurality of pre-stored audio.
Step S1003, clustering the plurality of pre-stored audio according to the plurality of chrominance vectors.
Step S1004, generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database.
For example, referring to fig. 3, step S1003 may further include:
in step S1003A, n chroma vectors of a plurality of pre-stored audio are randomly selected as cluster centers.
Step S1003B, iterating: the distances from the chroma vectors except for the n chroma vectors to the n cluster centers are calculated respectively, and then the cluster center closest to the chroma vectors is selected as the belonging classification.
Step S103C, updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; i.e. the average of the chrominance vectors in the classification of n cluster centers is calculated.
Step S103D, judging whether the new cluster center changes after each iteration, and stopping the iteration if the new cluster center does not change; otherwise, continuing iteration and updating the cluster center by taking the new cluster center generated after the iteration as a center.
Thus, a clustering center model of the pre-stored audio is established.
Specifically, the clustering result is related to the spectrogram, the vertical axis on the piano window image represents the pitch sum, the horizontal axis represents the time, and the time of sound pronunciation is input along the time series. Therefore, the vertical axis pitch of the piano window image can be found from the amplitude of the spectrogram, and the horizontal axis time can be found from the frequency. Binarizing the piano window image of each pre-stored audio by using image processing to obtain piano window image data, and storing the piano window image data in a database.
Step S102, analyzing a target spectrogram of the audio to be detected.
And converting the audio signal to be detected into the target spectrogram through Fourier transformation, wherein a frequency value is taken as a horizontal axis, and an amplitude is taken as a vertical axis. The conversion process is as follows:
the function of the audio to be detected and the time t is s 1 (t) selecting a window with time t as the center and h as the time window 1 (t) the longer the window length, the longer the signal is intercepted.
At this time s 1 (t) becomes:
s t1 (τ)=s 1 (τ)h 1 (τ-t)。
from the transformed s t1 (t) obtaining a frequency fractionThe fourier transform of the cloth is:
Figure GDA0004213472020000071
the energy spectrum density represents the frequency change of the audio to be measured in time t, and is:
Figure GDA0004213472020000072
from s t1 (t) and energy spectral Density P SP1 (t) calculating a time window h 1 (t) and s 1 The frequency distribution relationship of (t) is as follows:
Figure GDA0004213472020000073
S t1 (ω)=e -jωt s ω1 (t)。
the transverse axis of the audio spectrogram to be measured is H 1 (omega), the vertical axis S t1 (ω)。
Step S104, calculating a target chromaticity vector of the octave of the audio to be detected according to the target spectrogram.
Illustratively, the calculation formula of the target chromaticity vector is as follows:
Figure GDA0004213472020000081
wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves, for example, twelve parts of an octave distance, each part being a half tone, two half tones being equal to a whole tone; t represents time; h 1 Power representing a target spectrogram and time of the audio to be detected; i represents 8 intervals included in the octave, namely 12345671,0 is less than or equal to I is less than or equal to 7; c (k, t) represents the target chrominance vector.
And step S106, corresponding target piano window image data is generated according to the target chromaticity vector.
Specifically, a clustering center of the target chromaticity vector of the audio to be detected is input into the clustering center model to obtain a clustering center category of the clustering center of the target chromaticity vector of the audio to be detected in the clustering center model.
And forming a corresponding target piano window image according to the clustering center of the audio to be detected and the target spectrogram, wherein the vertical axis on the target piano window image represents the pitch sum, the horizontal axis represents the time, and the time of sound pronunciation is input along the time sequence. Therefore, the vertical axis pitch of the target piano window image can be found from the amplitude of the target spectrogram, and the horizontal axis time can be found from the frequency. The target piano window image of the audio to be measured is binarized using image processing to become target piano window image data.
Step S108, calculating the cross-correlation coefficient of the target piano window image data of the audio to be tested and the piano window image data of each of the plurality of pre-stored audios so as to obtain a plurality of cross-correlation coefficients.
The calculation formula of the cross-correlation coefficient is as follows, which comprises:
Figure GDA0004213472020000091
wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n); the larger the cross-correlation coefficient, the higher the matching degree of the two.
Step S110, selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the plurality of cross-correlation coefficients.
Illustratively, referring to FIG. 4, step S110 includes:
step S110A, selecting a plurality of pre-stored audio frequencies with cross-correlation coefficients greater than a preset threshold from the plurality of pre-stored audio frequencies according to the plurality of cross-correlation coefficients.
Step S110B, performing feature matching on the selected plurality of pre-stored audios according to the dynamic time warping.
Step S110C, selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the feature matching result.
Illustratively, step S110B further includes:
arranging the piano window image data of a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value and the piano window image data of the audios to be tested in a lattice form;
and calculating the Manhattan distance between the piano window image data of each pre-stored audio and the piano window image data of the audio to be tested by adopting a recursive formula of the Manhattan distance, wherein the Manhattan distance is used for representing a feature matching result between the audio to be tested and the corresponding preset audio.
Illustratively, after the audio test to be tested is completed, the piano window image data is stored in a database and used as a retrieval template for next matching.
Specifically, dynamic time warping (Dynamic Time warp, DTW) is a method for calculating the degree of matching of two time series by nonlinear expansion matching of time series patterns.
The two k-dimensional signals are as follows:
Figure GDA0004213472020000101
Figure GDA0004213472020000102
the audio X to be detected comprises M samples, and the pre-stored audio Y comprises N samples. Given the distance dmn (X, Y) between the mth sample of the audio X to be measured and the nth sample of the pre-stored audio Y, this distance is stretched until the cumulative value of the distance scale from the start point to dmn (X, Y) becomes the minimum value. First, the possible values of dmn (X, Y) are arranged in lattice form. Next, the minimum distance is calculated using the following formula.
d=∑d mn (X,Y)
m∈ix
n∈iy,
And (3) performing distance calculation by adopting a recursion formula based on Manhattan distance [ d (i, j) = |X1-X2|+|Y 1-Y2|]. The recursive formula provides a diagonal limiting path to prevent extreme stretching and shrinking, and matching windows for limiting the cumulative distance calculation area. A reduction in speech recognition performance and computation is achieved.
Example two
With continued reference to fig. 5, a schematic diagram of a program module of a music retrieval system according to a second embodiment of the present invention is shown. In this embodiment, the music retrieval system 20 may include or be divided into one or more program modules stored in a storage medium and executed by one or more processors to accomplish the present invention and may implement the music retrieval method described above. Program modules in accordance with the embodiments of the present invention are directed to a series of computer program instruction segments capable of performing particular functions, which are more suitable than the program itself for describing the execution of the music retrieval system 20 in a storage medium. The following description will specifically describe functions of each program module of the present embodiment:
a database module 200 is created for pre-configuring an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audio.
Illustratively, the step of pre-configuring the audio retrieval database includes:
a plurality of spectrograms of a plurality of pre-stored audios are analyzed.
And calculating the chrominance vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chrominance vectors of the pre-stored audio.
And clustering the plurality of pre-stored audios according to the plurality of chrominance vectors.
And generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database.
Exemplary, the step of clustering the target chromaticity vector of the audio to be detected includes:
randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center;
iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification;
updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers;
judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the cluster center by taking the new cluster center generated after the iteration as a center.
Thus, a clustering center model of the pre-stored audio is established.
Specifically, a piano window image of each pre-stored audio is generated according to the clustering center of the pre-stored audio and the spectrogram of the pre-stored audio, the vertical axis on the piano window image represents pitch and the horizontal axis represents time, and the time of sound pronunciation is input along the time sequence. Therefore, the vertical axis pitch of the piano window image can be found from the amplitude of the spectrogram, and the horizontal axis time can be found from the frequency. Binarizing the piano window image of each pre-stored audio by using image processing to obtain piano window image data, and storing the piano window image data in a database.
The analysis module 201 is configured to analyze a target spectrogram of the audio to be detected.
The audio signal to be measured is converted into the target spectrogram through fourier transformation, and the frequency value is taken as a horizontal axis and the amplitude is taken as a vertical axis. The conversion process is as follows:
the function of the audio to be detected and the time t is s 1 (t) selecting a window with time t as the center and h as the time window 1 (t) the longer the window length, the longer the signal is intercepted.
At this time s 1 (t) becomes:
s t1 (τ)=s 1 (τ)h 1 (τ-t)。
from the transformed s t1 (t) fourier transform to obtain frequency distribution:
Figure GDA0004213472020000121
the energy spectrum density represents the frequency change of the audio to be measured in time t, and is:
Figure GDA0004213472020000122
from s t1 (t) and energy spectral Density P SP1 (t) calculating a time window h 1 (t) and s 1 The frequency distribution relationship of (t) is as follows:
Figure GDA0004213472020000123
S t1 (ω)=e -jωt s ω1 (t)。
the transverse axis of the audio spectrogram to be measured is H 1 (omega), the vertical axis S t1 (ω)。
A first calculation module 202, configured to calculate a target chroma vector of an octave of the audio to be measured according to the target spectrogram.
Illustratively, the calculation formula of the target chromaticity vector is as follows:
Figure GDA0004213472020000124
wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h 1 Power representing a target spectrogram and time of the audio to be detected; i represents the eightThe interval of the degree comprises 8 intervals, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.
And the processing module 203 is used for generating corresponding target piano window image data according to the target chromaticity vector.
Specifically, a cluster center of the target chromaticity vector of the audio to be detected is input into the cluster center model, so that a cluster center category of the cluster center of the target chromaticity vector of the audio to be detected in the cluster center model is obtained.
And forming a corresponding target piano window image according to the clustering center of the audio to be detected and the target spectrogram, wherein the vertical axis on the target piano window image represents the pitch sum, the horizontal axis represents the time, and the time of sound pronunciation is input along the time sequence. Therefore, the vertical axis pitch of the target piano window image can be found from the amplitude of the target spectrogram, and the horizontal axis time can be found from the frequency. The target piano window image of the audio to be measured is binarized using image processing to become target piano window image data.
The second calculating module 204 is configured to calculate a cross-correlation coefficient between the target piano window image data of the audio to be tested and the piano window image data of each of the plurality of pre-stored audios, so as to obtain a plurality of cross-correlation coefficients.
The calculation formula of the cross-correlation coefficient is as follows, which comprises:
Figure GDA0004213472020000131
wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n); the larger the cross-correlation coefficient, the higher the matching degree of the two.
And the matching module 205 is configured to select, according to the plurality of cross-correlation coefficients, a target audio that matches the audio to be detected from the plurality of pre-stored audio.
Illustratively, the matching module 205 is further configured to:
and selecting a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value from the plurality of pre-stored audios according to the plurality of cross-correlation coefficients.
And performing feature matching on the selected plurality of pre-stored audios according to the dynamic time warping.
And selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the characteristic matching result.
Illustratively, the matching module 205 is further configured to:
arranging the piano window image data of a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value and the piano window image data of the audios to be tested in a lattice form;
and calculating the Manhattan distance between the piano window image data of each pre-stored audio and the piano window image data of the audio to be tested by adopting a recursive formula of the Manhattan distance, wherein the Manhattan distance is used for representing a feature matching result between the audio to be tested and the corresponding preset audio.
Specifically, dynamic time warping (Dynamic Time warp, DTW) is a method for calculating the degree of matching of two time series by nonlinear expansion matching of time series patterns.
The two k-dimensional signals are as follows:
Figure GDA0004213472020000141
Figure GDA0004213472020000142
the audio X to be detected comprises M samples, and the pre-stored audio Y comprises N samples. Given the distance dmn (X, Y) between the mth sample of the audio X to be measured and the nth sample of the pre-stored audio Y, this distance is stretched until the cumulative value of the distance scale from the start point to dmn (X, Y) becomes the minimum value. First, the possible values of dmn (X, Y) are arranged in lattice form. Next, the minimum distance is calculated using the following formula.
d=∑d mn (X,Y)
m∈ix
n∈iy,
And (3) performing distance calculation by adopting a recursion formula based on Manhattan distance [ d (i, j) = |X1-X2|+|Y 1-Y2|]. The recursive formula provides a diagonal limiting path to prevent extreme stretching and shrinking, and matching windows for limiting the cumulative distance calculation area. A reduction in speech recognition performance and computation is achieved.
Example III
Referring to fig. 6, a hardware architecture diagram of a computer device according to a third embodiment of the present invention is shown. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster made up of multiple servers), or the like. As shown in fig. 6, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a music retrieval system 20, which are communicatively connected to each other via a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 2. Of course, the memory 21 may also include both internal storage units of the computer device 2 and external storage devices. In this embodiment, the memory 21 is typically used to store an operating system and various types of application software installed on the computer device 2, such as program codes of the music retrieval system 20 of the second embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the music retrieval system 20, to implement the music retrieval method of the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, which network interface 23 is typically used for establishing a communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network.
It is noted that fig. 6 only shows a computer device 2 having components 20-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the music retrieval system 20 stored in the memory 21 may be further divided into one or more program modules, which are stored in the memory 21 and executed by one or more processors (the processor 22 in this embodiment) to complete the present invention.
For example, fig. 5 shows a schematic diagram of a program module for implementing the second embodiment of the music retrieval system 20, where the music retrieval system 20 may be divided into a database creation module 200, an analysis module 201, a first calculation module 202, a processing module 203, a second calculation module 204, and a matching module 205. Program modules in the present invention are understood to mean a series of computer program instruction segments capable of performing a specific function, more suitable than a program for describing the execution of the music retrieval system 20 in the computer device 2. The specific functions of the program modules 200-205 are described in detail in the second embodiment, and are not described herein.
Example IV
The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing the music retrieval system 20, and when executed by a processor, implements the music retrieval method of the first embodiment.
According to the music retrieval method, the system, the computer equipment and the computer readable storage medium, the chromaticity vector is used for calculation, when the music retrieval is performed, the cross correlation coefficient of the piano window image of the audio to be detected and the pre-stored audio is calculated, then the feature matching is performed, the pre-stored audio which is successfully matched is output as a retrieval result, and the retrieval efficiency is improved. And a large amount of prestored audios can be subjected to cluster analysis, and the data are stored in a piano window form, so that the storage space is saved.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A music retrieval method, comprising:
pre-configuring an audio retrieval database, wherein the audio retrieval database comprises a plurality of piano window image data of a plurality of pre-stored audios;
analyzing a target spectrogram of the audio to be detected;
calculating a target chromaticity vector of octave of the audio to be detected according to the target spectrogram;
generating corresponding target piano window image data according to the target chromaticity vector;
calculating cross-correlation coefficients of the target piano window image data of the audio to be detected and piano window image data of each of the plurality of pre-stored audios to obtain a plurality of cross-correlation coefficients;
selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients;
a step of pre-configuring an audio retrieval database, comprising:
analyzing a plurality of spectrograms of a plurality of pre-stored audios;
calculating the chromaticity vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chromaticity vectors of the plurality of pre-stored audio;
clustering the plurality of pre-stored audio according to the plurality of chrominance vectors: randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center; iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification; updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers; judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the clustering center by taking the new clustering center generated after the iteration as a center;
generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database;
generating corresponding target piano window image data according to the target chromaticity vector, wherein the step comprises the following steps:
acquiring a clustering center of the target chromaticity vector of the audio to be detected; a kind of electronic device with high-pressure air-conditioning system
Generating corresponding target piano window image data according to the clustering center of the audio to be detected;
the calculation formula of the target chromaticity vector is as follows:
Figure FDA0004213472010000021
wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h 1 Power representing a target spectrogram and time of the audio to be detected; i represents 8 musical intervals included in the octave, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.
2. The music retrieval method according to claim 1, wherein the calculation formula of the cross-correlation coefficient is as follows, including:
Figure FDA0004213472010000022
wherein g (i, j) represents the piano window image data of the pre-stored audio, f (i, j) represents the piano window image data of the audio to be tested, R fg (m, n) represents the cross-correlation coefficient of the pre-stored audio and the audio to be measured at coordinates (m, n).
3. The music retrieval method according to claim 1, wherein the step of selecting a target audio matching the audio to be detected from the plurality of pre-stored audios based on the plurality of cross-correlation coefficients includes:
selecting a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value from the plurality of pre-stored audios according to the plurality of cross-correlation coefficients;
performing feature matching on the selected plurality of pre-stored audios according to dynamic time warping;
and selecting target audio matched with the audio to be detected from the plurality of pre-stored audio according to the characteristic matching result.
4. A music retrieval method according to claim 3, wherein the step of feature matching the selected ones of the pre-stored audio according to dynamic time warping comprises:
arranging the piano window image data of a plurality of pre-stored audios with cross-correlation coefficients larger than a preset threshold value and the piano window image data of the audios to be tested in a lattice form;
and calculating the Manhattan distance between the piano window image data of each pre-stored audio and the piano window image data of the audio to be tested by adopting a recursive formula of the Manhattan distance, wherein the Manhattan distance is used for representing a feature matching result between the audio to be tested and the corresponding preset audio.
5. A music retrieval system, comprising:
a database creating module for pre-configuring an audio retrieval database including a plurality of piano window image data of a plurality of pre-stored audio;
the analysis module is used for analyzing a target spectrogram of the audio to be detected;
the first calculation module is used for calculating a target chromaticity vector of the octave of the audio to be detected according to the target spectrogram;
the processing module is used for generating corresponding target piano window image data according to the target chromaticity vector;
the second calculation module is used for calculating the cross-correlation coefficient of the target piano window image data of the audio to be detected and the piano window image data of each of the plurality of pre-stored audios so as to obtain a plurality of cross-correlation coefficients;
the matching module is used for selecting target audio matched with the audio to be detected from the plurality of prestored audio according to the plurality of cross-correlation coefficients;
the database creation module is further used for analyzing a plurality of spectrograms of a plurality of pre-stored audios; calculating the chromaticity vector of each pre-stored audio according to the octave and each spectrogram so as to obtain a plurality of chromaticity vectors of the plurality of pre-stored audio; clustering the plurality of pre-stored audio according to the plurality of chrominance vectors: randomly selecting n chroma vectors of a plurality of pre-stored audios as a clustering center; iteration is performed: respectively calculating distances from the chroma vectors except for the n chroma vectors to n clustering centers, and then selecting the nearest clustering center as the belonging classification; updating a clustering center: calculating a new cluster center after each iteration by using an averaging method; calculating the average value of the chrominance vectors in the classification of n clustering centers; judging whether the new cluster center changes after each iteration, and stopping iteration if the new cluster center does not change; otherwise, continuing iteration and updating the clustering center by taking the new clustering center generated after the iteration as a center; generating piano window image data of each pre-stored audio according to the clustering result of each pre-stored audio, and storing the piano window image data of each pre-stored audio in a database;
the processing module is also used for acquiring a clustering center where the target chromaticity vector of the audio to be detected is located; generating corresponding target piano window image data according to the clustering center of the audio to be detected;
the calculation formula of the target chromaticity vector is as follows:
Figure FDA0004213472010000041
/>
wherein i represents a target spectrogram of the audio to be detected; k represents 12 pitches between the octaves; t represents time; h 1 Power representing a target spectrogram and time of the audio to be detected; i represents 8 musical intervals included in the octave, and I is more than or equal to 0 and less than or equal to 7; c (k, t) represents the target chrominance vector.
6. A computer device comprising a memory, a processor, the memory having stored thereon a music retrieval system operable on the processor, the music retrieval system when executed by the processor implementing the steps of the music retrieval method of any of claims 1-4.
7. A computer-readable storage medium, in which a computer program is stored, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the music retrieval method according to any one of claims 1-4.
CN201910541222.7A 2019-06-21 2019-06-21 Music retrieval method, system, computer device and computer readable storage medium Active CN110399521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910541222.7A CN110399521B (en) 2019-06-21 2019-06-21 Music retrieval method, system, computer device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910541222.7A CN110399521B (en) 2019-06-21 2019-06-21 Music retrieval method, system, computer device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110399521A CN110399521A (en) 2019-11-01
CN110399521B true CN110399521B (en) 2023-06-06

Family

ID=68323334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910541222.7A Active CN110399521B (en) 2019-06-21 2019-06-21 Music retrieval method, system, computer device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110399521B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436806A (en) * 2011-09-29 2012-05-02 复旦大学 Audio frequency copy detection method based on similarity
CN103853749A (en) * 2012-11-30 2014-06-11 国际商业机器公司 Mode-based audio retrieval method and system
CN103890838A (en) * 2011-06-10 2014-06-25 X-系统有限公司 Method and system for analysing sound
WO2017222569A1 (en) * 2016-06-22 2017-12-28 Gracenote, Inc. Matching audio fingerprints
CN109002529A (en) * 2018-07-17 2018-12-14 厦门美图之家科技有限公司 Audio search method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9501568B2 (en) * 2015-01-02 2016-11-22 Gracenote, Inc. Audio matching based on harmonogram

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103890838A (en) * 2011-06-10 2014-06-25 X-系统有限公司 Method and system for analysing sound
CN102436806A (en) * 2011-09-29 2012-05-02 复旦大学 Audio frequency copy detection method based on similarity
CN103853749A (en) * 2012-11-30 2014-06-11 国际商业机器公司 Mode-based audio retrieval method and system
WO2017222569A1 (en) * 2016-06-22 2017-12-28 Gracenote, Inc. Matching audio fingerprints
CN109002529A (en) * 2018-07-17 2018-12-14 厦门美图之家科技有限公司 Audio search method and device

Also Published As

Publication number Publication date
CN110399521A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
WO2019100606A1 (en) Electronic device, voiceprint-based identity verification method and system, and storage medium
JP5732994B2 (en) Music searching apparatus and method, program, and recording medium
EP3014612B1 (en) Acoustic music similarity determiner
JP4644250B2 (en) Apparatus and method for determining the type of chords inherent in a test signal
JP2009047831A (en) Feature quantity extracting device, program and feature quantity extraction method
CN106055659B (en) Lyric data matching method and equipment thereof
US20200152162A1 (en) Musical analysis method, music analysis device, and program
CN110556126A (en) Voice recognition method and device and computer equipment
US20190213279A1 (en) Apparatus and method of analyzing and identifying song
CN110738980A (en) Singing voice synthesis model training method and system and singing voice synthesis method
US20160027421A1 (en) Audio signal analysis
WO2019196301A1 (en) Electronic device, deep learning-based method and system for musical notation recognition, and storage medium
CN108711415B (en) Method, apparatus and storage medium for correcting time delay between accompaniment and dry sound
JP2002032424A (en) Device and method for circuit analysis, and computer- readable recording medium with program making computer implement the method recorded thereon
CN110399521B (en) Music retrieval method, system, computer device and computer readable storage medium
CN111986698A (en) Audio segment matching method and device, computer readable medium and electronic equipment
Benetos et al. A temporally-constrained convolutive probabilistic model for pitch detection
CN110415722B (en) Speech signal processing method, storage medium, computer program, and electronic device
CN111863030A (en) Audio detection method and device
CN115083422B (en) Voice traceability evidence obtaining method and device, equipment and storage medium
US7617102B2 (en) Speaker identifying apparatus and computer program product
CN115329125A (en) Song skewer burning splicing method and device
US20140140519A1 (en) Sound processing device, sound processing method, and program
CN111681671B (en) Abnormal sound identification method and device and computer storage medium
CN109686376B (en) Song singing evaluation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant