CN102930873B

CN102930873B - Information entropy based music humming detecting method

Info

Publication number: CN102930873B
Application number: CN201210371373.0A
Authority: CN
Inventors: 张栋; 谢志成; 叶东毅; 余春艳; 刘会彬; 张玉溪
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2012-09-29
Filing date: 2012-09-29
Publication date: 2014-04-09
Anticipated expiration: 2032-09-29
Also published as: CN102930873A

Abstract

The invention relates to an information entropy based music humming detecting method. The sound of singing is cut through an information entropy method by utilizing the pronunciation similarity of two characters before and after humming through human voice, and then the cut result and the result of a standard document are compared, thus the function whether singing is conducted is detected. The detecting method provided by the invention is flexible to achieve and is high in detecting efficiency.

Description

Music humming detection method based on information entropy

Technical field

The present invention relates to voice and cut apart field, particularly a kind of music humming detection method based on information entropy.

Background technology

In recent years, along with popularizing of music entertainment, the identification based on music humming, search and scoring become the focus of research and application, are subject to the extensive concern of academia and industry.As everyone knows, groan song more more laborsaving than singing, and be more prone to hold the tone of song, so somebody, in order to increase the scoring of oneself, can add humming in K song process, this is just for our marking has increased difficulty and inaccuracy.So can differentiate people and when K sings be with groan or sing, can produce larger impact to the precision of our scoring, thereby also affected, user experiences and the quality of product.When humming, similar before and after our articulation type, when cutting apart, voice are not easy to each single-tone to distinguish and come one by one.Therefore, how utilizing voice to cut apart and accurately pick out humming and sing, is constantly to improve points-scoring system, is also the major issue of development music entertainment simultaneously.

Summary of the invention

The object of this invention is to provide a kind of music humming detection method based on information entropy, the humming situation in the time of effectively detecting K song.

The present invention adopts following scheme to realize: a kind of music humming detection method based on information entropy, it is characterized in that: the pronunciation similarity of utilizing voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, comprise the steps:

(1) after obtaining the digital music voice signal of input, whole voice signal is carried out to filtering, normalization pre-service;

(2) to voice signal, divide frame to process, calculate respectively the information entropy of each frame;

(3) according to information entropy, whole voice signal is divided into some sections;

(4) read normative document, if segmentation result is less than the over half of normative document reading result, thinks that this section of voice hum, otherwise think that this voice signal is normal.

In an embodiment of the present invention, the frame length W of each described frame is described as the hits in 10 ~ 30ms, the time span * sample frequency of each frame of W=; Described frame moves WF and is described as the underlapped part of adjacent two frames, WF=frame length/2.

In an embodiment of the present invention, described information entropy is described as representing the size of time series confusion degree, and time series distributes more chaotic, and information entropy is larger, otherwise less.

In an embodiment of the present invention, described normative document is described as a series of tlv triple O _i(begin _i, end _i, C _i), 1<=i<=n wherein, C _ifor the lyrics, begin _ibe the initial time of i word, end _ibe the closing time of i word, n represents the lyrics sum of voice segments.

In an embodiment of the present invention, the calculating of the information entropy of each frame described in described step (2) realizes according to following scheme: described voice are divided into after the speech frame that length is W, for each frame, be handled as follows: find the maximal value max in this frame, be then divided into isometric k interval [0, x ₁], [x ₁, x ₂] ..., [x _k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p ₁, p ₂..., p _k, then according to formula

Figure 2012103713730100002DEST_PATH_IMAGE002

, finally obtain the information entropy of this frame; The information entropy sequence of whole voice signal represents with H.

In an embodiment of the present invention, described in described step (3), according to information entropy, whole voice signal being divided into some sections is to realize according to following scheme: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h _i>flag, if below continuously L put h _i+1, h _i+2..., h _i+Lvalue be all greater than flag, suppose that L ' stops, this L ' >L, from a h _ito a h _{i+L '}this section of corresponding speech signal segments be exactly one section of required independent voice that split; The like, by H, find some sections of independent voice, be designated as n, a resulting n voice segments is exactly segmentation result.

Useful achievement of the present invention is: the present invention proposes a kind of music humming detection method of cutting apart based on information entropy, its by information entropy to the dividing method of voice sentence by sentence in addition cutting of song, effectively detected the situation of humming, the method is simple, realize flexibly, there is stronger practicality.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the music humming detection method based on information entropy.

Embodiment

Please refer to Fig. 1, the present invention is based on the music humming detection method of information entropy, utilize the pronunciation similarity of voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, specific as follows:

1, voice signal divides frame: first whole voice signal is carried out the pre-service such as filtering, normalization.Then by the voice that obtain, according to length, be that W, each frame move the speech frame that is divided into segment for WF, wherein, W represents frame length, the time span * sample frequency of each frame of W=; WF represents that frame moves, WF=frame length/2.For each frame, be handled as follows: find the maximal value max in this frame, be then divided into isometric k interval [0, x ₁], [x ₁, x ₂] ..., [x _k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p ₁, p ₂..., p _k, then according to formula , finally obtain the information entropy of this frame.The information entropy sequence of whole voice signal represents with H.

2, cut apart voice signal: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h _i>flag, if below continuously L put h _i+1, h _i+2..., h _i+Lvalue be all greater than flag, suppose that L ' stops (L ' >L), from a h _ito a h _{i+L '}this section of corresponding speech signal segments be exactly one section of required independent voice that split.The like, we can find some sections of independent voice by H, are designated as N, and a resulting N voice segments is exactly our segmentation result.

3, contrast segmentation result and normative document, whether be humming: according to the tlv triple O in normative document if detecting _i(begin _i, end _i, C _i), 1<=i<=n wherein, C _ifor the lyrics, begin _ifor the initial time of this word, end _ifor the closing time of altering, n represents the lyrics sum of voice segments, establishes certain words from T _mto T _m+1constantly, searching the number of words that normative document obtains comprising in this moment is M, if there is N<M/2 to set up, the part that has comprised humming in these words is described, needs to be deducted points in scoring.

The foregoing is only preferred embodiment of the present invention, all equalizations of doing according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.

Claims

1. the music based on information entropy is hummed detection method, it is characterized in that: the pronunciation similarity of utilizing voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, comprise the steps:

(4) read normative document, if segmentation result is less than the over half of normative document reading result, thinks that this section of voice hum, otherwise think that this voice signal is normal; Described information entropy is described as representing the size of time series confusion degree, and time series distributes more chaotic, and information entropy is larger, otherwise less.

2. the music humming detection method based on information entropy according to claim 1, is characterized in that: the frame length W of each described frame is described as the hits in 10 ~ 30ms the time span * sample frequency of each frame of W=; Frame moves WF and is described as the underlapped part of adjacent two frames, WF=frame length/2.

3. the music humming detection method based on information entropy according to claim 1, is characterized in that: described normative document is described as a series of tlv triple O _i(begin _i, end _i, C _i), 1<=i<=n wherein, C _ifor the lyrics, begin _ibe the initial time of i word, end _ibe the closing time of i word, n represents the lyrics sum of voice segments.

4. the music based on information entropy according to claim 1 is hummed detection method, it is characterized in that: the calculating of the information entropy of each frame described in described step (2) realizes according to following scheme: described voice are divided into after the speech frame that length is W, for each frame, be handled as follows: find the maximal value max in this frame, then be divided into isometric k interval [0, x ₁], [x ₁, x ₂] ..., [x _k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p ₁, p ₂..., p _k, then according to formula

Figure 2012103713730100001DEST_PATH_IMAGE002

5. the music based on information entropy according to claim 4 is hummed detection method, it is characterized in that: described in described step (3), according to information entropy, whole voice signal being divided into some sections is to realize according to following scheme: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h _i>flag, if below continuously L put h _i+1, h _i+2..., h _i+Lvalue be all greater than flag, suppose that L ' stops, this L ' >L, from a h _ito a h _{i+L '}this section of corresponding speech signal segments be exactly one section of required independent voice that split; The like, by H, find some sections of independent voice, be designated as n, a resulting n voice segments is exactly segmentation result.