CN102930873B - Information entropy based music humming detecting method - Google Patents
Information entropy based music humming detecting method Download PDFInfo
- Publication number
- CN102930873B CN102930873B CN201210371373.0A CN201210371373A CN102930873B CN 102930873 B CN102930873 B CN 102930873B CN 201210371373 A CN201210371373 A CN 201210371373A CN 102930873 B CN102930873 B CN 102930873B
- Authority
- CN
- China
- Prior art keywords
- information entropy
- frame
- voice
- humming
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention relates to an information entropy based music humming detecting method. The sound of singing is cut through an information entropy method by utilizing the pronunciation similarity of two characters before and after humming through human voice, and then the cut result and the result of a standard document are compared, thus the function whether singing is conducted is detected. The detecting method provided by the invention is flexible to achieve and is high in detecting efficiency.
Description
Technical field
The present invention relates to voice and cut apart field, particularly a kind of music humming detection method based on information entropy.
Background technology
In recent years, along with popularizing of music entertainment, the identification based on music humming, search and scoring become the focus of research and application, are subject to the extensive concern of academia and industry.As everyone knows, groan song more more laborsaving than singing, and be more prone to hold the tone of song, so somebody, in order to increase the scoring of oneself, can add humming in K song process, this is just for our marking has increased difficulty and inaccuracy.So can differentiate people and when K sings be with groan or sing, can produce larger impact to the precision of our scoring, thereby also affected, user experiences and the quality of product.When humming, similar before and after our articulation type, when cutting apart, voice are not easy to each single-tone to distinguish and come one by one.Therefore, how utilizing voice to cut apart and accurately pick out humming and sing, is constantly to improve points-scoring system, is also the major issue of development music entertainment simultaneously.
Summary of the invention
The object of this invention is to provide a kind of music humming detection method based on information entropy, the humming situation in the time of effectively detecting K song.
The present invention adopts following scheme to realize: a kind of music humming detection method based on information entropy, it is characterized in that: the pronunciation similarity of utilizing voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, comprise the steps:
(1) after obtaining the digital music voice signal of input, whole voice signal is carried out to filtering, normalization pre-service;
(2) to voice signal, divide frame to process, calculate respectively the information entropy of each frame;
(3) according to information entropy, whole voice signal is divided into some sections;
(4) read normative document, if segmentation result is less than the over half of normative document reading result, thinks that this section of voice hum, otherwise think that this voice signal is normal.
In an embodiment of the present invention, the frame length W of each described frame is described as the hits in 10 ~ 30ms, the time span * sample frequency of each frame of W=; Described frame moves WF and is described as the underlapped part of adjacent two frames, WF=frame length/2.
In an embodiment of the present invention, described information entropy is described as representing the size of time series confusion degree, and time series distributes more chaotic, and information entropy is larger, otherwise less.
In an embodiment of the present invention, described normative document is described as a series of tlv triple O
i(begin
i, end
i, C
i), 1<=i<=n wherein, C
ifor the lyrics, begin
ibe the initial time of i word, end
ibe the closing time of i word, n represents the lyrics sum of voice segments.
In an embodiment of the present invention, the calculating of the information entropy of each frame described in described step (2) realizes according to following scheme: described voice are divided into after the speech frame that length is W, for each frame, be handled as follows: find the maximal value max in this frame, be then divided into isometric k interval [0, x
1], [x
1, x
2] ..., [x
k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p
1, p
2..., p
k, then according to formula
, finally obtain the information entropy of this frame; The information entropy sequence of whole voice signal represents with H.
In an embodiment of the present invention, described in described step (3), according to information entropy, whole voice signal being divided into some sections is to realize according to following scheme: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h
i>flag, if below continuously L put h
i+1, h
i+2..., h
i+Lvalue be all greater than flag, suppose that L ' stops, this L ' >L, from a h
ito a h
i+L 'this section of corresponding speech signal segments be exactly one section of required independent voice that split; The like, by H, find some sections of independent voice, be designated as n, a resulting n voice segments is exactly segmentation result.
Useful achievement of the present invention is: the present invention proposes a kind of music humming detection method of cutting apart based on information entropy, its by information entropy to the dividing method of voice sentence by sentence in addition cutting of song, effectively detected the situation of humming, the method is simple, realize flexibly, there is stronger practicality.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the music humming detection method based on information entropy.
Embodiment
Please refer to Fig. 1, the present invention is based on the music humming detection method of information entropy, utilize the pronunciation similarity of voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, specific as follows:
1, voice signal divides frame: first whole voice signal is carried out the pre-service such as filtering, normalization.Then by the voice that obtain, according to length, be that W, each frame move the speech frame that is divided into segment for WF, wherein, W represents frame length, the time span * sample frequency of each frame of W=; WF represents that frame moves, WF=frame length/2.For each frame, be handled as follows: find the maximal value max in this frame, be then divided into isometric k interval [0, x
1], [x
1, x
2] ..., [x
k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p
1, p
2..., p
k, then according to formula
, finally obtain the information entropy of this frame.The information entropy sequence of whole voice signal represents with H.
2, cut apart voice signal: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h
i>flag, if below continuously L put h
i+1, h
i+2..., h
i+Lvalue be all greater than flag, suppose that L ' stops (L ' >L), from a h
ito a h
i+L 'this section of corresponding speech signal segments be exactly one section of required independent voice that split.The like, we can find some sections of independent voice by H, are designated as N, and a resulting N voice segments is exactly our segmentation result.
3, contrast segmentation result and normative document, whether be humming: according to the tlv triple O in normative document if detecting
i(begin
i, end
i, C
i), 1<=i<=n wherein, C
ifor the lyrics, begin
ifor the initial time of this word, end
ifor the closing time of altering, n represents the lyrics sum of voice segments, establishes certain words from T
mto T
m+1constantly, searching the number of words that normative document obtains comprising in this moment is M, if there is N<M/2 to set up, the part that has comprised humming in these words is described, needs to be deducted points in scoring.
The foregoing is only preferred embodiment of the present invention, all equalizations of doing according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.
Claims (5)
1. the music based on information entropy is hummed detection method, it is characterized in that: the pronunciation similarity of utilizing voice former and later two words when humming, method by information entropy is cut song sentence by sentence, the result comparison with normative document by segmentation result again, realize and detect the function of whether humming, comprise the steps:
(1) after obtaining the digital music voice signal of input, whole voice signal is carried out to filtering, normalization pre-service;
(2) to voice signal, divide frame to process, calculate respectively the information entropy of each frame;
(3) according to information entropy, whole voice signal is divided into some sections;
(4) read normative document, if segmentation result is less than the over half of normative document reading result, thinks that this section of voice hum, otherwise think that this voice signal is normal; Described information entropy is described as representing the size of time series confusion degree, and time series distributes more chaotic, and information entropy is larger, otherwise less.
2. the music humming detection method based on information entropy according to claim 1, is characterized in that: the frame length W of each described frame is described as the hits in 10 ~ 30ms the time span * sample frequency of each frame of W=; Frame moves WF and is described as the underlapped part of adjacent two frames, WF=frame length/2.
3. the music humming detection method based on information entropy according to claim 1, is characterized in that: described normative document is described as a series of tlv triple O
i(begin
i, end
i, C
i), 1<=i<=n wherein, C
ifor the lyrics, begin
ibe the initial time of i word, end
ibe the closing time of i word, n represents the lyrics sum of voice segments.
4. the music based on information entropy according to claim 1 is hummed detection method, it is characterized in that: the calculating of the information entropy of each frame described in described step (2) realizes according to following scheme: described voice are divided into after the speech frame that length is W, for each frame, be handled as follows: find the maximal value max in this frame, then be divided into isometric k interval [0, x
1], [x
1, x
2] ..., [x
k-1, max], add up this frame at number the calculating probability of each interval value, obtain Probability p
1, p
2..., p
k, then according to formula
, finally obtain the information entropy of this frame; The information entropy sequence of whole voice signal represents with H.
5. the music based on information entropy according to claim 4 is hummed detection method, it is characterized in that: described in described step (3), according to information entropy, whole voice signal being divided into some sections is to realize according to following scheme: according to the maximal value definite threshold flag=max(H of H)/3, described voice segments must meet length and be greater than 150ms, the length that corresponds to information entropy is L=(0.15*fs)/WF, in H, from first point, start, find certain to put h
i>flag, if below continuously L put h
i+1, h
i+2..., h
i+Lvalue be all greater than flag, suppose that L ' stops, this L ' >L, from a h
ito a h
i+L 'this section of corresponding speech signal segments be exactly one section of required independent voice that split; The like, by H, find some sections of independent voice, be designated as n, a resulting n voice segments is exactly segmentation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210371373.0A CN102930873B (en) | 2012-09-29 | 2012-09-29 | Information entropy based music humming detecting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210371373.0A CN102930873B (en) | 2012-09-29 | 2012-09-29 | Information entropy based music humming detecting method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102930873A CN102930873A (en) | 2013-02-13 |
CN102930873B true CN102930873B (en) | 2014-04-09 |
Family
ID=47645654
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210371373.0A Expired - Fee Related CN102930873B (en) | 2012-09-29 | 2012-09-29 | Information entropy based music humming detecting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102930873B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1703734A (en) * | 2002-10-11 | 2005-11-30 | 松下电器产业株式会社 | Method and apparatus for determining musical notes from sounds |
CN1737798A (en) * | 2005-09-08 | 2006-02-22 | 上海交通大学 | Music rhythm sectionalized automatic marking method based on eigen-note |
CN101383149A (en) * | 2008-10-27 | 2009-03-11 | 哈尔滨工业大学 | Stringed music vibrato automatic detection method |
WO2010136722A1 (en) * | 2009-05-29 | 2010-12-02 | Voxler | Method for detecting words in a voice and use thereof in a karaoke game |
US7962530B1 (en) * | 2007-04-27 | 2011-06-14 | Michael Joseph Kolta | Method for locating information in a musical database using a fragment of a melody |
CN102568456A (en) * | 2011-12-23 | 2012-07-11 | 深圳市万兴软件有限公司 | Notation recording method and a notation recording device based on humming input |
-
2012
- 2012-09-29 CN CN201210371373.0A patent/CN102930873B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1703734A (en) * | 2002-10-11 | 2005-11-30 | 松下电器产业株式会社 | Method and apparatus for determining musical notes from sounds |
CN1737798A (en) * | 2005-09-08 | 2006-02-22 | 上海交通大学 | Music rhythm sectionalized automatic marking method based on eigen-note |
US7962530B1 (en) * | 2007-04-27 | 2011-06-14 | Michael Joseph Kolta | Method for locating information in a musical database using a fragment of a melody |
CN101383149A (en) * | 2008-10-27 | 2009-03-11 | 哈尔滨工业大学 | Stringed music vibrato automatic detection method |
WO2010136722A1 (en) * | 2009-05-29 | 2010-12-02 | Voxler | Method for detecting words in a voice and use thereof in a karaoke game |
CN102568456A (en) * | 2011-12-23 | 2012-07-11 | 深圳市万兴软件有限公司 | Notation recording method and a notation recording device based on humming input |
Non-Patent Citations (2)
Title |
---|
一种新的哼唱音符音高划分方法;杨剑锋等;《电脑知识与技术》;20111031;全文 * |
杨剑锋等.一种新的哼唱音符音高划分方法.《电脑知识与技术》.2011,全文. |
Also Published As
Publication number | Publication date |
---|---|
CN102930873A (en) | 2013-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9262521B2 (en) | Apparatus and method for extracting highlight section of music | |
CN105096932A (en) | Voice synthesis method and apparatus of talking book | |
CN101159834A (en) | Method and system for detecting repeatable video and audio program fragment | |
CN105338327A (en) | Video monitoring networking system capable of achieving speech recognition | |
Fayolle et al. | CRF-based combination of contextual features to improve a posteriori word-level confidence measures | |
Zheng et al. | Acoustic texttiling for story segmentation of spoken documents | |
CN113658594A (en) | Lyric recognition method, device, equipment, storage medium and product | |
CN104167211B (en) | Multi-source scene sound abstracting method based on hierarchical event detection and context model | |
CN102708859A (en) | Real-time music voice identification system | |
CN102930873B (en) | Information entropy based music humming detecting method | |
Katte et al. | Techniques for Indian classical raga identification-a survey | |
Bohac et al. | Post-processing of the recognized speech for web presentation of large audio archive | |
Jeong et al. | Dlr: Toward a deep learned rhythmic representation for music content analysis | |
Bhattacharyya et al. | An approach to identify thhat of Indian Classical Music | |
CN106649643B (en) | A kind of audio data processing method and its device | |
CN109829061A (en) | A kind of multimedia messages lookup method and system | |
Leng et al. | Classification of overlapped audio events based on AT, PLSA, and the combination of them | |
Wu et al. | Singing voice detection of popular music using beat tracking and SVM classification | |
JP2012159717A (en) | Musical-data change point detection device, musical-data change point detection method, and musical-data change point detection program | |
Akiba et al. | DTW-Distance-Ordered Spoken Term Detection and STD-based Spoken Content Retrieval: Experiments at NTCIR-10 SpokenDoc-2. | |
Wu et al. | Interruption point detection of spontaneous speech using inter-syllable boundary-based prosodic features | |
Books | Type of publication: Idiap-RR Citation: Szaszak_Idiap-RR-25-2013 Number: Idiap-RR-25-2013 Year: 2013 Month: 7 | |
Pham Van et al. | Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music | |
Goto et al. | PodCastle and Songle: crowdsourcing-based web services for spoken document retrieval and active music listening | |
Nawasalkar et al. | Performance analysis of different audio with raga Yaman |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140409 Termination date: 20170929 |