CN109308901A

CN109308901A - Chanteur's recognition methods and device

Info

Publication number: CN109308901A
Application number: CN201811148198.2A
Authority: CN
Inventors: 陈建哲; 钟思思; 贺学焱
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2019-02-05

Abstract

The embodiment of the present application discloses chanteur's recognition methods and device.One specific embodiment of this method includes: to be handled using the voice disjunctive model trained music data to be identified, obtains the voice data in music data to be identified；Voice data in music data to be identified are inputted to the chanteur's identification model trained, obtain chanteur's recognition result of music data to be identified.The embodiment improves the accuracy of chanteur's identification.

Description

Chanteur's recognition methods and device

Technical field

The invention relates to field of computer technology, and in particular to voice technology field more particularly to chanteur know Other method and apparatus.

Background technique

Chanteur's identification, is the identity that chanteur is identified from song.Chanteur identifies the model for belonging to Speaker Identification Farmland, existing chanteur's recognition methods are the speech recognition engine that song is directly inputted to speaker for identification, speech recognition Engine is identified according to identity of the phonetic characteristics in song to chanteur.

In usual song other than the sound comprising chanteur, includes also accompaniment music, then extracted from song Phonetic characteristics had both included the acoustic feature of chanteur, also included the acoustic feature of accompaniment music, so chanteur's identification is compared In Speaker Identification, there is certain difficulty.Also, chanteur sing when articulation type with speak when articulation type not Together, certain difficulty also is brought to chanteur's identification.

Summary of the invention

The embodiment of the present application proposes chanteur's recognition methods and device.

In a first aspect, the embodiment of the present application provides a kind of chanteur's recognition methods, comprising: using the voice point trained Music data to be identified is handled from model, obtains the voice data in music data to be identified；It will be to be identified Voice data in music data input the chanteur's identification model trained, and the chanteur for obtaining music data to be identified knows Other result.

In some embodiments, the above method further include: the people trained is obtained based on the training of first sample music data Sound disjunctive model.

It is in some embodiments, above-mentioned that the voice disjunctive model trained is obtained based on the training of first sample music data, It include: to extract the spectrum signature of first sample music data, and the spectrum signature based on first sample music data is from the first sample Sample voice data are isolated in this music data；Voice disjunctive model to be trained is constructed based on gauss hybrid models, by sample This voice data carry out the isolated first sample of voice to first sample music data as voice disjunctive model to be trained The expected result of voice data in music data, training obtain the voice disjunctive model trained.

It is in some embodiments, above-mentioned that the voice disjunctive model trained is obtained based on the training of first sample music data, It include: the spectrum signature for extracting first sample music data, the frequecy characteristic based on first sample music data will be from sample sound Happy data are decomposed into sample voice data and sample accompaniment data；Voice splitting die to be trained is constructed based on gauss hybrid models Type carries out voice isolated the to first sample music data using sample voice data as voice disjunctive model to be trained The expected result of voice data in one sample music data, and using sample accompaniment data as voice splitting die to be trained Type carries out the expected result of the accompaniment data in the isolated first sample music data of voice, instruction to first sample music data Get out the voice disjunctive model trained.

In some embodiments, the above method further include: based on the second sample with corresponding chanteur's markup information Music data training obtains the chanteur's identification model trained, comprising: the second sample music data is inputted to the people trained Sound disjunctive model obtains the voice data in the second sample music data；To be trained sing is constructed based on gauss hybrid models The chanteur of second sample music data is marked letter using the voice data in the second sample music data by person's identification model Cease the expectation as chanteur's identification model to be trained to chanteur's identification of the voice data in the second sample music data It is trained as a result, treating trained chanteur's model, the chanteur's identification model trained.

Second aspect, the application implementation provide a kind of chanteur's identification device, comprising: separative unit is configured as adopting Music data to be identified is handled with the voice disjunctive model trained, obtains the voice in music data to be identified Data；Recognition unit is configured as the chanteur that the voice data input in music data to be identified has been trained identifying mould Type obtains chanteur's recognition result of music data to be identified.

In some embodiments, above-mentioned apparatus further include: the first training unit is configured as based on first sample music number The voice disjunctive model trained is obtained according to training.

In some embodiments, above-mentioned first training unit is configured to press based on first sample music data The voice disjunctive model trained is obtained according to the training of such as under type: being extracted the spectrum signature of first sample music data, and is based on The spectrum signature of first sample music data isolates sample voice data from first sample music data；Based on Gaussian Mixture Model construction voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first sample Music data carries out the expected result of the voice data in the isolated first sample music data of voice, and training, which obtains, have been trained Voice disjunctive model.

In some embodiments, above-mentioned first training unit is configured to press based on first sample music data The voice disjunctive model trained is obtained according to the training of such as under type: extracting the spectrum signature of first sample music data, is based on the The frequecy characteristic of one sample music data will be decomposed into sample voice data and sample accompaniment data from sample music data；It is based on Gauss hybrid models construct voice disjunctive model to be trained, using sample voice data as voice disjunctive model pair to be trained First sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and will Sample accompaniment data carries out isolated first sample of voice to first sample music data as voice disjunctive model to be trained The expected result of accompaniment data in this music data, training obtain the voice disjunctive model trained.

In some embodiments, above-mentioned apparatus further include: the second training unit is configured as being based on having corresponding sing Second sample music data of person's markup information, training obtains the chanteur's identification model trained as follows: by the Two sample music datas input the voice disjunctive model trained, and obtain the voice data in the second sample music data；It is based on Gauss hybrid models construct chanteur's identification model to be trained, using the voice data in the second sample music data, by Chanteur's markup information of two sample music datas is as chanteur's identification model to be trained in the second sample music data Voice data chanteur identification expected result, treat trained chanteur's model and be trained, the song trained The person's of singing identification model.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more A processor realizes the chanteur's recognition methods provided such as first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, In, chanteur's recognition methods that first aspect provides is realized when program is executed by processor.

Chanteur's recognition methods of the above embodiments of the present application and device, by using the voice disjunctive model pair trained Music data to be identified is handled, and the voice data in music data to be identified are obtained；By music data to be identified In voice data input chanteur's identification model for having trained, obtain chanteur's recognition result of music data to be identified, Improve the accuracy of chanteur's identification.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that the embodiment of the present application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of chanteur's recognition methods of the application；

Fig. 3 is the flow chart according to another embodiment of chanteur's recognition methods of the application；

Fig. 4 is the structural schematic diagram of one embodiment of chanteur's identification device of the application；

Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using chanteur's recognition methods of the application or the exemplary system frame of chanteur's identification device Structure 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network can To include various connection types, such as wired, wireless communication link or fiber optic cables etc..

User 110 can be used terminal device 101,102,103 and be interacted with server 105 by network 104, with reception or Send message etc..Various voice messaging interactive applications can be installed, such as voice assistant is answered on terminal device 101,102,103 With, information search application, map application, social platform application, audio and video playing application etc..

Terminal device 101,102,103 can be the equipment with audio signal sample function, can be with microphone And support the various electronic equipments of internet access, including but not limited to car-mounted terminal, intelligent sound box, smart phone, plate are electric Brain, smartwatch, laptop, above-knee pocket computer, E-book reader etc..

Server 105 can be to provide the server of Audio Signal Processing, such as speech recognition server.Server 105 The speech processes request that can receive the transmission of terminal device 101,102,103, requests speech processes to carry out tone decoding, correlation The operations such as information inquiry, and by the processing result of speech processes request by network 104 feed back to terminal device 101,102, 103。

Terminal device 101,102,103 may include the component (such as the processors such as GPU) for executing physical manipulations, eventually End equipment 101,102,103 can also carry out processing locality to the speech processes request that user 110 initiates, such as can be for use The chanteur that family 110 issues identifies request, extracts from the music data of song to be identified and sings relevant feature, and with it is existing The feature templates of singing of chanteur match, and obtain chanteur's recognition result.

Chanteur's recognition methods provided by the embodiment of the present application can be by terminal device 101,102,103 or server 105 execute, and correspondingly, chanteur's identification device can be set in terminal device 101,102,103 or server 105.

It should be understood that the terminal device, network, the number of server in Fig. 1 are only schematical.According to realization need It wants, can have any number of terminal device, network, server.Also, in the embodiment of the present application, above system framework Network and server can not included.

With continued reference to Fig. 2, it illustrates the processes 200 according to one embodiment of chanteur's recognition methods of the application. Chanteur's recognition methods, comprising the following steps:

Step 201, the voice disjunctive model that use has been trained handles music data to be identified, obtains to be identified Music data in voice data.

In the present embodiment, the executing subject (such as server shown in FIG. 1 or terminal device) of chanteur's recognition methods Available music data to be identified.Herein, music data to be identified can be is closed by singing data and accompaniment data At music data.The music data to be identified can be the audio source file of a song, or can be by with wheat The electronic equipment of gram wind records the audio data of generation in songs playing process.

In actual scene, when user wishes to know the singer of a musical works, the musical works can be transferred Audio file as music data to be identified, or when user hears a song being played on, shifting can be opened The recording function of dynamic electronic equipment, the song for recording broadcasting form music data to be identified.Music data to be identified can be with It is the audio data of arbitrary format, such as REC, WMA etc..

After getting music data to be identified, music data to be identified can be inputted to the voice point trained From model, voice disjunctive model to be trained can be the mould of voice data and accompaniment data in the audio for separating input Type can be in advance based on sample audio data training and obtain.Voice disjunctive model can use various machine learning model frameworks, Such as model based on decision tree, logic-based recurrence, the model of linear regression, the neural network based on deep learning etc..

In training voice disjunctive model, available sample audio data and the corresponding singer of sample audio data Phonetic feature, then the voice data in sample audio data are separated using voice disjunctive model to be trained, it is right After separation to sample audio data in voice data carry out speech feature extraction, and compare the phonetic feature that extracts with The consistency of the phonetic feature of the corresponding singer of the sample audio data of acquisition repeats to adjust wait instruct according to the consistency of the two The parameter of experienced voice disjunctive model, so that according to voice disjunctive model to be trained to the voice data in sample audio data The phonetic feature of phonetic feature singer corresponding with sample audio data that extracts of separating resulting reach unanimity.Join in adjustment Several numbers reaches pre-determined number or voice disjunctive model to be trained reaches the voice separating resulting of sample audio data When preset condition, training, the voice disjunctive model trained are completed.

The above-mentioned voice disjunctive model trained can be training in advance and be stored in above-mentioned executing subject, in this reality It applies in example, after the voice disjunctive model that music data to be identified input has been trained, can divide from music to be identified Voice data are separated out, sing data as singer to be identified.

Step 202, the voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained Chanteur's recognition result of music data to be identified.

Chanteur's identification model that the voice data input that step 201 obtains has been trained can be subjected to chanteur's identification. The chanteur's identification model trained can be the model for identifying corresponding chanteur according to data are sung.

The above-mentioned chanteur's identification model trained can be based on decision tree, Logic Regression Models, deep neural network etc. Model construction.In the training process, data iteration adjustment can be sung based on the sample for having marked corresponding chanteur to wait instructing The parameter of experienced chanteur's identification model, to correct chanteur's identification model to the recognition result of chanteur.

Specifically, during training chanteur's identification model, available sample sings data, and sample sings number According to can be the music data that when cappela (i.e.) generates when chanteur sings opera arias.Sample can be sung data input it is to be trained Chanteur's identification model identifies, obtains the chanteur that chanteur's identification model to be trained sings data to sample and identifies knot Then chanteur's identification model to be trained is sung chanteur's recognition result of data to sample and the sample marked is sung by fruit It sings the corresponding chanteur of data to be compared, according to difference iteration adjustment between the two chanteur's identification model to be trained Parameter so that chanteur's identification model to be trained after adjusting parameter to sample sing chanteur's recognition results of data with The sample of mark sings the diminution of the difference between the corresponding chanteur of data.Above-mentioned adjusting parameter is repeated, adjustment ginseng is compared Chanteur's recognition result that chanteur's identification model to be trained after number sings data to sample is sung with the sample marked Difference between the corresponding chanteur of data, then according to difference continue adjusting parameter process so that after adjusting parameter to Trained chanteur's identification model sings chanteur's recognition results of data to sample, and with the sample marked to sing data corresponding Chanteur meet the preset condition of convergence or the number of iterations reaches preset number, at this moment can have been instructed with deconditioning Experienced chanteur's identification model.

It, can be based on the Speaker Identification model construction song obtained in some optional implementations of the present embodiment The person's of singing identification model.Speaker Identification model can be model trained, speaker's identity for identification.It can will speak People's identification model is trained as initial chanteur's identification model to be trained.In this way, due to initial song to be trained There is the person's of singing identification model resolution not have to tone color, the different speakers of articulation type and the ability of chanteur, can accelerate The training speed of chanteur's identification model.

Referring back to Fig. 1, an illustrative scene of the above embodiments of the present application are as follows: user 110 hears a song Afterwards, singer is initiated to server 105 by terminal device 101,102,103 and identifies request.The available terminal of server 105 is set The audio data of standby 101,102,103 songs uploaded, then saved using server local or server institute is in the cluster The voice disjunctive model of storage trained handles music data to be identified, by the voice data separating in song It deals, then save the voice data isolated input server local or server stores in the cluster Trained chanteur's identification model identifies, obtains the recognition result of the singer of song.Then, server 105 can incite somebody to action Recognition result feeds back to user 110 by terminal device 101,102,103.

Chanteur's recognition methods of the above embodiments of the present application, by by the voice data and accompaniment data in music data Separation carries out chanteur's identification using isolated voice data, is able to ascend the accuracy of chanteur's identification.

With continued reference to Fig. 3, it illustrates the method streams according to another embodiment of chanteur's recognition methods of the application Journey schematic diagram.The process 300 of chanteur's recognition methods, comprising the following steps:

Step 301, the voice disjunctive model trained is obtained based on the training of first sample music data.

In the present embodiment, the available first sample music data of the executing subject of chanteur's recognition methods, and be based on Trained voice disjunctive model is treated with first sample music data to be trained.

Specifically, first sample music data, which can be, sings data and accompaniment data synthesizes by corresponding.Wherein, Accompaniment data may include, piano, guitar, drum.The voice data of the musical instruments such as bass.It can be using isolated software application of accompanying First sample music data is handled, to extract voice data from first sample music data.Such as accompaniment separation Software application the voice data in first sample music data can be eliminated, accompaniment data is obtained, then according to the first sample The difference of this music data and corresponding accompaniment data obtains the voice data in first sample music data.It can will be based on answering The voice data for using software to isolate as first sample music data voice separating resulting markup information, treat trained Voice disjunctive model is trained, and optimizes voice splitting die to be trained by the parameter of iteration adjustment voice disjunctive model Type., can be with deconditioning when the voice disjunctive model wait train reaches preset optimizing index, the people that has been trained Sound disjunctive model.

It, can be based on first sample music data as follows in some optional implementations of the present embodiment Training obtains the voice disjunctive model trained: extracting the spectrum signature of first sample music data, and is based on the first sample The spectrum signature of this music data isolates sample voice data from first sample music data；Based on gauss hybrid models structure Voice disjunctive model to be trained is built, using sample voice data as voice disjunctive model to be trained to first sample music number According to the expected result for carrying out the voice data in the isolated first sample music data of voice, training obtains the voice trained Disjunctive model.

Since human body is different with the principle of sound of musical instrument, usual voice data and accompaniment data have different frequency spectrums special Sign, such as voice data and the corresponding frequency-region signal of accompaniment data have different amplitude and energy feature.It can be according to statistics It learns the spectrum signature of voice data and accompaniment data counted to be separated, or by the way of based on deep neural network pair The spectrum signature of voice data and accompaniment data is learnt, and the frequency spectrum for allowing deep neural network to differentiate voice data is special It seeks peace the spectrum signature of accompaniment data.The time-domain signal of first sample music data can specifically be converted to frequency domain, then The spectrum signature of first sample music data is extracted in frequency domain, later according to the different spectral of voice data and accompaniment data spy Sign, the voice data separating in first sample music data is come out, sample voice data are obtained, which can be with Annotation results as the corresponding voice data of first sample music data.

Voice separation to be trained can be constructed based on gauss hybrid models (Gaussian Mixture Model, GMM) Model.It is then possible to using sample voice data as the expected result to the voice data separating in first sample music data, First sample music data is inputted to voice disjunctive model to be trained, it should be wait instruct using the machine learning mode training for having supervision Experienced voice disjunctive model.It can will specifically compare what voice disjunctive model to be trained isolated first sample music data Voice data and the sample voice data isolated by spectrum signature, if difference between the two is unsatisfactory for preset item Part, then can be with the parameter of iteration adjustment gauss hybrid models, so that based on voice data separating model to be trained to the first sample Difference between voice data in the voice separating resulting of this music data and sample voice data reduces.In the difference of the two Stop adjusting parameter when meeting preset condition, the voice disjunctive model trained.

By the sample voice data that will be extracted from first sample music data based on spectrum signature as wait train Voice disjunctive model to the expected result of the voice data separating in first sample music data, voice separation that training obtains Model can accurately extract the voice data in music data.

It, can be based on first sample music data according to such as lower section in other optional implementations of the present embodiment Formula training obtains the voice disjunctive model trained: extracting the spectrum signature of first sample music data, is based on the first sample The frequecy characteristic of this music data will be decomposed into sample voice data and sample accompaniment data from sample music data；Based on Gauss Mixed model constructs voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first Sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and by sample Accompaniment data carries out the isolated first sample sound of voice to first sample music data as voice disjunctive model to be trained The expected result of accompaniment data in happy data, training obtain the voice disjunctive model trained.

Specifically, can be according to the different spectral feature of voice data and accompaniment data, it will be in first sample music data Voice data and accompaniment data separate, obtain sample voice data and sample accompaniment data.The sample voice data can Using the annotation results as the corresponding voice data of first sample music data, sample accompaniment data then can be used as first sample The annotation results of the corresponding accompaniment data of music data.Separation based on spectrum signature to voice data and accompaniment data herein It can be using as, based on statistical method or based on the method for deep neural network, no longer gone to live in the household of one's in-laws on getting married in aforementioned implementation herein It states.

Voice separation to be trained can be constructed based on gauss hybrid models (Gaussian Mixture Model, GMM) Model.It is then possible to using sample voice data as the expected result to the voice data separating in first sample music data, And using sample accompaniment data as the expected result of the accompaniment data separation in corresponding sample music data, by first sample music Data input voice disjunctive model to be trained, and train the voice splitting die to be trained using the machine learning mode for having supervision Type.Specifically can by the voice data for comparing voice disjunctive model to be trained first sample music data being isolated with pass through The sample voice data that spectrum signature is isolated obtain the first comparison result, and compare voice disjunctive model to be trained to first The accompaniment data that sample music data is isolated obtains second with the sample accompaniment data isolated by spectrum signature and compares knot Fruit, if the preset condition of convergence or the first comparison result and is not satisfied in the first comparison result and the second comparison result The summation of two comparison results is unsatisfactory for the preset condition of convergence, then can be with the parameter of iteration adjustment gauss hybrid models, so that base Voice data and sample in the voice separating resulting of voice data separating model to be trained to first sample music data Difference between voice data reduces, and/or based on voice data separating model to be trained to first sample music data Difference between accompaniment data in voice separating resulting and sample accompaniment data reduces.It is compared in the first comparison result and second As a result the summation for being all satisfied the preset condition of convergence or the first comparison result and the second comparison result meets preset convergence item Stop adjusting parameter when part, the voice disjunctive model trained.

Above-mentioned implementation is separated using sample voice data and sample accompaniment data as voice in the training process Model trains the people obtained to the voice separating resulting of first sample music data and the expected result of accompaniment separating resulting in this way Sound disjunctive model can preferably separate voice data and accompaniment data, guarantee that the voice data isolated and accompaniment data all have There is higher fidelity.

Step 302, the voice disjunctive model that use has been trained handles music data to be identified, obtains to be identified Music data in voice data.

In the present embodiment, available music data to be identified.Herein, music data to be identified can be by Sing the music data of data and accompaniment data synthesis.The music data to be identified can be the audio source document of a song Part, or can be the audio data for recording generation in songs playing process by the electronic equipment with microphone.

After getting music data to be identified, music data input step 301 to be identified can be obtained The voice disjunctive model trained, isolates voice data from music to be identified, as singing for singer to be identified Data.

Step 303, the voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained Chanteur's recognition result of music data to be identified.

Chanteur's identification model that the voice data input that step 302 obtains has been trained can be subjected to chanteur's identification. The chanteur's identification model trained can be the model for identifying corresponding chanteur according to data are sung.

The above-mentioned chanteur's identification model trained can be based on decision tree, Logic Regression Models, deep neural network etc. Model construction.In the training process, data iteration adjustment can be sung based on the sample for having marked corresponding chanteur to wait training Chanteur's identification model parameter, to correct chanteur's identification model to the recognition result of chanteur.

Step 302, step 303 in chanteur's recognition methods of the present embodiment respectively with the step 201 of previous embodiment, Step 202 is consistent, and the description above with respect to step 201, step 202 is also applied for step 302, step 303, no longer superfluous herein It states.

The method flow 300 of chanteur's identification of the present embodiment, trained based on first sample music data by increase The step of voice disjunctive model trained out, can obtain the voice point for the voice data being more suitable in separating music data From model, so as to promote the accuracy of chanteur's identification.

In above-mentioned combination Fig. 2 and some optional implementations of Fig. 3 described embodiment, above-mentioned chanteur's identification Method further includes showing that has trained sings based on the second sample music data training with corresponding chanteur's markup information The step of person's identification model.The step can execute before step 202 and before step 303.It should be corresponding based on having The step of the second sample music data training of chanteur's markup information obtains the chanteur's identification model trained, it is specific to wrap It includes: the second sample music data being inputted to the voice disjunctive model trained, obtains the voice number in the second sample music data According to；Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the voice number in the second sample music data According to using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to the second sample music The expected result of chanteur's identification of voice data in data, treats trained chanteur's model and is trained, instructed Experienced chanteur's identification model.

Specifically, available second sample music data, the second sample music data can be with corresponding song The music data of the person's of singing markup information can collect some songs in practice, and the artist information for obtaining these songs comes Form the second sample music data.

Chanteur's identification model to be trained can be constructed based on gauss hybrid models, chanteur that should be to be trained identifies mould Type can be the model for classification, and the corresponding chanteur's markup information of the second sample music data is that chanteur to be trained knows The expected result that other model classifies to the second sample music data.In the training process, chanteur's identification model to be trained can Feature when learning different chanteur's sounding, sing, by the parameter of iteration adjustment chanteur's identification model to be trained come So that chanteur's identification model to be trained is to the classification results of the second sample music data and corresponding chanteur's markup information Between difference be gradually reduced, chanteur's identification model to be trained to the classification results of the second sample music data with it is corresponding Chanteur's markup information between difference when meeting preset difference condition, adjusting parameter can be stopped, completing training.

By the chanteur's identification model trained based on the second sample music data for having marked chanteur, Obtained chanteur's identification model can preferably learn the tune of different chanteurs out, sing the difference between habit, So as to promote the accuracy of chanteur's identification model.

With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of chanteur identifications One embodiment of device, the Installation practice is corresponding with Fig. 2 and embodiment of the method shown in Fig. 3, which specifically can be with Applied in various electronic equipments.

As shown in figure 4, chanteur's identification device 400 of the present embodiment includes separative unit 401 and recognition unit 402.Its In, separative unit 401, which can be configured as, is handled music data to be identified using the voice disjunctive model trained, Obtain the voice data in music data to be identified；Recognition unit 402 can be configured as will be in music data to be identified Voice data input chanteur's identification model for having trained, obtain chanteur's recognition result of music data to be identified.

In some embodiments, above-mentioned apparatus 400 can also include: the first training unit, be configured as based on the first sample The training of this music data obtains the voice disjunctive model trained.

In some embodiments, above-mentioned first training unit can be configured to be based on first sample music number According to training obtains the voice disjunctive model trained as follows: the spectrum signature of first sample music data is extracted, and Spectrum signature based on first sample music data isolates sample voice data from first sample music data；Based on Gauss Mixed model constructs voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first Sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and training obtains Trained voice disjunctive model.

In some embodiments, above-mentioned first training unit can be configured to be based on first sample music number According to training obtains the voice disjunctive model trained as follows: extracting the spectrum signature of first sample music data, base Sample voice data and sample accompaniment data will be decomposed into from sample music data in the frequecy characteristic of first sample music data； Voice disjunctive model to be trained is constructed based on gauss hybrid models, using sample voice data as voice splitting die to be trained Type carries out the expected result of the voice data in the isolated first sample music data of voice to first sample music data, and And voice isolated the is carried out to first sample music data using sample accompaniment data as voice disjunctive model to be trained The expected result of accompaniment data in one sample music data, training obtain the voice disjunctive model trained.

In some embodiments, above-mentioned apparatus 400 can also include: the second training unit, be configured as based on have pair Second sample music data of the chanteur's markup information answered, training show that the chanteur trained identifies mould as follows Type: the second sample music data is inputted to the voice disjunctive model trained, obtains the voice number in the second sample music data According to；Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the voice number in the second sample music data According to using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to the second sample music The expected result of chanteur's identification of voice data in data, treats trained chanteur's model and is trained, instructed Experienced chanteur's identification model.

It should be appreciated that all units recorded in device 400 and each step phase in the method described referring to figs. 2 and 3 It is corresponding.It is equally applicable to device 400 and unit wherein included above with respect to the operation and feature of method description as a result, herein It repeats no more.

Chanteur's identification device 400 of the above embodiments of the present application, by will be to using the voice disjunctive model trained Voice data separating in the music data of identification comes out, and only obtains chanteur's identification model that the input of voice data has been trained Chanteur's recognition result reduces the influence that accompaniment data identifies chanteur, is able to ascend the accuracy of chanteur's identification.

Below with reference to Fig. 5, it illustrates the computer systems 500 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 5 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 505 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 508 including hard disk etc.； And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon Computer program be mounted into storage section 508 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 509, and/or from detachable media 511 are mounted.When the computer program is executed by central processing unit (CPU) 501, limited in execution the present processes Above-mentioned function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or calculating Machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, programming language include object oriented program language-such as Java, Smalltalk, C++, also Including conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete It executes, partly executed on the user computer on the user computer entirely, being executed as an independent software package, part Part executes on the remote computer or executes on a remote computer or server completely on the user computer.It is relating to And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include separative unit and recognition unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions It is fixed, for example, separative unit is also described as " carrying out music data to be identified using the voice disjunctive model trained Processing, obtains the unit of the voice data in music data to be identified ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: music data to be identified is handled using the voice disjunctive model trained, obtains music data to be identified In voice data；Voice data in music data to be identified are inputted into chanteur's identification model for having trained, obtain to Chanteur's recognition result of the music data of identification.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of chanteur's recognition methods, comprising:

Music data to be identified is handled using the voice disjunctive model trained, is obtained in music data to be identified Voice data；

Voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained described to be identified Chanteur's recognition result of music data.

2. according to the method described in claim 1, wherein, the method also includes:

The voice disjunctive model trained is obtained based on the training of first sample music data.

3. according to the method described in claim 2, wherein, described trained based on first sample music data obtains described trained Voice disjunctive model, comprising:

Extract the spectrum signature of the first sample music data, and the spectrum signature based on the first sample music data from Sample voice data are isolated in first sample music data；

Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data Expected result, training obtains the voice disjunctive model trained.

4. according to the method described in claim 2, wherein, described trained based on first sample music data obtains described trained Voice disjunctive model, comprising:

The spectrum signature of the first sample music data is extracted, the frequecy characteristic based on first sample music data will be from sample Music data is decomposed into sample voice data and sample accompaniment data；

Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data Expected result, and using the sample accompaniment data as the voice disjunctive model to be trained to first sample music number According to the expected result for carrying out the accompaniment data in the isolated first sample music data of voice, training obtains described trained Voice disjunctive model.

5. method according to claim 1-4, wherein the method also includes:

The chanteur trained is obtained based on the second sample music data training with corresponding chanteur's markup information Identification model, comprising:

The second sample music data input voice disjunctive model trained is obtained in the second sample music data Voice data；

Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the people in the second sample music data Sound data, using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to second The expected result of chanteur's identification of voice data in sample music data, instructs chanteur's model to be trained Practice, obtains the chanteur's identification model trained.

6. a kind of chanteur's identification device, comprising:

Separative unit is configured as handling music data to be identified using the voice disjunctive model trained, be obtained Voice data in music data to be identified；

Recognition unit is configured as the chanteur that the voice data input in music data to be identified has been trained identifying mould Type obtains chanteur's recognition result of the music data to be identified.

7. device according to claim 6, wherein described device further include:

First training unit is configured as obtaining the voice splitting die trained based on the training of first sample music data Type.

8. device according to claim 7, wherein first training unit is configured to based on first sample Music data, training obtains the voice disjunctive model trained as follows:

9. device according to claim 7, wherein first training unit is configured to based on first sample Music data, training obtains the voice disjunctive model trained as follows:

10. according to the described in any item devices of claim 6-9, wherein described device further include:

Second training unit, is configured as based on the second sample music data with corresponding chanteur's markup information, according to The chanteur's identification model trained as described in obtaining under type training:

11. a kind of electronic equipment, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 5.