CN109308901A - Chanteur's recognition methods and device - Google Patents
Chanteur's recognition methods and device Download PDFInfo
- Publication number
- CN109308901A CN109308901A CN201811148198.2A CN201811148198A CN109308901A CN 109308901 A CN109308901 A CN 109308901A CN 201811148198 A CN201811148198 A CN 201811148198A CN 109308901 A CN109308901 A CN 109308901A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- music data
- sample
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012549 training Methods 0.000 claims description 60
- 238000001228 spectrum Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000000926 separation method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000006854 communication Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The embodiment of the present application discloses chanteur's recognition methods and device.One specific embodiment of this method includes: to be handled using the voice disjunctive model trained music data to be identified, obtains the voice data in music data to be identified;Voice data in music data to be identified are inputted to the chanteur's identification model trained, obtain chanteur's recognition result of music data to be identified.The embodiment improves the accuracy of chanteur's identification.
Description
Technical field
The invention relates to field of computer technology, and in particular to voice technology field more particularly to chanteur know
Other method and apparatus.
Background technique
Chanteur's identification, is the identity that chanteur is identified from song.Chanteur identifies the model for belonging to Speaker Identification
Farmland, existing chanteur's recognition methods are the speech recognition engine that song is directly inputted to speaker for identification, speech recognition
Engine is identified according to identity of the phonetic characteristics in song to chanteur.
In usual song other than the sound comprising chanteur, includes also accompaniment music, then extracted from song
Phonetic characteristics had both included the acoustic feature of chanteur, also included the acoustic feature of accompaniment music, so chanteur's identification is compared
In Speaker Identification, there is certain difficulty.Also, chanteur sing when articulation type with speak when articulation type not
Together, certain difficulty also is brought to chanteur's identification.
Summary of the invention
The embodiment of the present application proposes chanteur's recognition methods and device.
In a first aspect, the embodiment of the present application provides a kind of chanteur's recognition methods, comprising: using the voice point trained
Music data to be identified is handled from model, obtains the voice data in music data to be identified;It will be to be identified
Voice data in music data input the chanteur's identification model trained, and the chanteur for obtaining music data to be identified knows
Other result.
In some embodiments, the above method further include: the people trained is obtained based on the training of first sample music data
Sound disjunctive model.
It is in some embodiments, above-mentioned that the voice disjunctive model trained is obtained based on the training of first sample music data,
It include: to extract the spectrum signature of first sample music data, and the spectrum signature based on first sample music data is from the first sample
Sample voice data are isolated in this music data;Voice disjunctive model to be trained is constructed based on gauss hybrid models, by sample
This voice data carry out the isolated first sample of voice to first sample music data as voice disjunctive model to be trained
The expected result of voice data in music data, training obtain the voice disjunctive model trained.
It is in some embodiments, above-mentioned that the voice disjunctive model trained is obtained based on the training of first sample music data,
It include: the spectrum signature for extracting first sample music data, the frequecy characteristic based on first sample music data will be from sample sound
Happy data are decomposed into sample voice data and sample accompaniment data;Voice splitting die to be trained is constructed based on gauss hybrid models
Type carries out voice isolated the to first sample music data using sample voice data as voice disjunctive model to be trained
The expected result of voice data in one sample music data, and using sample accompaniment data as voice splitting die to be trained
Type carries out the expected result of the accompaniment data in the isolated first sample music data of voice, instruction to first sample music data
Get out the voice disjunctive model trained.
In some embodiments, the above method further include: based on the second sample with corresponding chanteur's markup information
Music data training obtains the chanteur's identification model trained, comprising: the second sample music data is inputted to the people trained
Sound disjunctive model obtains the voice data in the second sample music data;To be trained sing is constructed based on gauss hybrid models
The chanteur of second sample music data is marked letter using the voice data in the second sample music data by person's identification model
Cease the expectation as chanteur's identification model to be trained to chanteur's identification of the voice data in the second sample music data
It is trained as a result, treating trained chanteur's model, the chanteur's identification model trained.
Second aspect, the application implementation provide a kind of chanteur's identification device, comprising: separative unit is configured as adopting
Music data to be identified is handled with the voice disjunctive model trained, obtains the voice in music data to be identified
Data;Recognition unit is configured as the chanteur that the voice data input in music data to be identified has been trained identifying mould
Type obtains chanteur's recognition result of music data to be identified.
In some embodiments, above-mentioned apparatus further include: the first training unit is configured as based on first sample music number
The voice disjunctive model trained is obtained according to training.
In some embodiments, above-mentioned first training unit is configured to press based on first sample music data
The voice disjunctive model trained is obtained according to the training of such as under type: being extracted the spectrum signature of first sample music data, and is based on
The spectrum signature of first sample music data isolates sample voice data from first sample music data;Based on Gaussian Mixture
Model construction voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first sample
Music data carries out the expected result of the voice data in the isolated first sample music data of voice, and training, which obtains, have been trained
Voice disjunctive model.
In some embodiments, above-mentioned first training unit is configured to press based on first sample music data
The voice disjunctive model trained is obtained according to the training of such as under type: extracting the spectrum signature of first sample music data, is based on the
The frequecy characteristic of one sample music data will be decomposed into sample voice data and sample accompaniment data from sample music data;It is based on
Gauss hybrid models construct voice disjunctive model to be trained, using sample voice data as voice disjunctive model pair to be trained
First sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and will
Sample accompaniment data carries out isolated first sample of voice to first sample music data as voice disjunctive model to be trained
The expected result of accompaniment data in this music data, training obtain the voice disjunctive model trained.
In some embodiments, above-mentioned apparatus further include: the second training unit is configured as being based on having corresponding sing
Second sample music data of person's markup information, training obtains the chanteur's identification model trained as follows: by the
Two sample music datas input the voice disjunctive model trained, and obtain the voice data in the second sample music data;It is based on
Gauss hybrid models construct chanteur's identification model to be trained, using the voice data in the second sample music data, by
Chanteur's markup information of two sample music datas is as chanteur's identification model to be trained in the second sample music data
Voice data chanteur identification expected result, treat trained chanteur's model and be trained, the song trained
The person's of singing identification model.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors;Storage dress
It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more
A processor realizes the chanteur's recognition methods provided such as first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program,
In, chanteur's recognition methods that first aspect provides is realized when program is executed by processor.
Chanteur's recognition methods of the above embodiments of the present application and device, by using the voice disjunctive model pair trained
Music data to be identified is handled, and the voice data in music data to be identified are obtained;By music data to be identified
In voice data input chanteur's identification model for having trained, obtain chanteur's recognition result of music data to be identified,
Improve the accuracy of chanteur's identification.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the embodiment of the present application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of chanteur's recognition methods of the application;
Fig. 3 is the flow chart according to another embodiment of chanteur's recognition methods of the application;
Fig. 4 is the structural schematic diagram of one embodiment of chanteur's identification device of the application;
Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using chanteur's recognition methods of the application or the exemplary system frame of chanteur's identification device
Structure 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server
105.Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network can
To include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User 110 can be used terminal device 101,102,103 and be interacted with server 105 by network 104, with reception or
Send message etc..Various voice messaging interactive applications can be installed, such as voice assistant is answered on terminal device 101,102,103
With, information search application, map application, social platform application, audio and video playing application etc..
Terminal device 101,102,103 can be the equipment with audio signal sample function, can be with microphone
And support the various electronic equipments of internet access, including but not limited to car-mounted terminal, intelligent sound box, smart phone, plate are electric
Brain, smartwatch, laptop, above-knee pocket computer, E-book reader etc..
Server 105 can be to provide the server of Audio Signal Processing, such as speech recognition server.Server 105
The speech processes request that can receive the transmission of terminal device 101,102,103, requests speech processes to carry out tone decoding, correlation
The operations such as information inquiry, and by the processing result of speech processes request by network 104 feed back to terminal device 101,102,
103。
Terminal device 101,102,103 may include the component (such as the processors such as GPU) for executing physical manipulations, eventually
End equipment 101,102,103 can also carry out processing locality to the speech processes request that user 110 initiates, such as can be for use
The chanteur that family 110 issues identifies request, extracts from the music data of song to be identified and sings relevant feature, and with it is existing
The feature templates of singing of chanteur match, and obtain chanteur's recognition result.
Chanteur's recognition methods provided by the embodiment of the present application can be by terminal device 101,102,103 or server
105 execute, and correspondingly, chanteur's identification device can be set in terminal device 101,102,103 or server 105.
It should be understood that the terminal device, network, the number of server in Fig. 1 are only schematical.According to realization need
It wants, can have any number of terminal device, network, server.Also, in the embodiment of the present application, above system framework
Network and server can not included.
With continued reference to Fig. 2, it illustrates the processes 200 according to one embodiment of chanteur's recognition methods of the application.
Chanteur's recognition methods, comprising the following steps:
Step 201, the voice disjunctive model that use has been trained handles music data to be identified, obtains to be identified
Music data in voice data.
In the present embodiment, the executing subject (such as server shown in FIG. 1 or terminal device) of chanteur's recognition methods
Available music data to be identified.Herein, music data to be identified can be is closed by singing data and accompaniment data
At music data.The music data to be identified can be the audio source file of a song, or can be by with wheat
The electronic equipment of gram wind records the audio data of generation in songs playing process.
In actual scene, when user wishes to know the singer of a musical works, the musical works can be transferred
Audio file as music data to be identified, or when user hears a song being played on, shifting can be opened
The recording function of dynamic electronic equipment, the song for recording broadcasting form music data to be identified.Music data to be identified can be with
It is the audio data of arbitrary format, such as REC, WMA etc..
After getting music data to be identified, music data to be identified can be inputted to the voice point trained
From model, voice disjunctive model to be trained can be the mould of voice data and accompaniment data in the audio for separating input
Type can be in advance based on sample audio data training and obtain.Voice disjunctive model can use various machine learning model frameworks,
Such as model based on decision tree, logic-based recurrence, the model of linear regression, the neural network based on deep learning etc..
In training voice disjunctive model, available sample audio data and the corresponding singer of sample audio data
Phonetic feature, then the voice data in sample audio data are separated using voice disjunctive model to be trained, it is right
After separation to sample audio data in voice data carry out speech feature extraction, and compare the phonetic feature that extracts with
The consistency of the phonetic feature of the corresponding singer of the sample audio data of acquisition repeats to adjust wait instruct according to the consistency of the two
The parameter of experienced voice disjunctive model, so that according to voice disjunctive model to be trained to the voice data in sample audio data
The phonetic feature of phonetic feature singer corresponding with sample audio data that extracts of separating resulting reach unanimity.Join in adjustment
Several numbers reaches pre-determined number or voice disjunctive model to be trained reaches the voice separating resulting of sample audio data
When preset condition, training, the voice disjunctive model trained are completed.
The above-mentioned voice disjunctive model trained can be training in advance and be stored in above-mentioned executing subject, in this reality
It applies in example, after the voice disjunctive model that music data to be identified input has been trained, can divide from music to be identified
Voice data are separated out, sing data as singer to be identified.
Step 202, the voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained
Chanteur's recognition result of music data to be identified.
Chanteur's identification model that the voice data input that step 201 obtains has been trained can be subjected to chanteur's identification.
The chanteur's identification model trained can be the model for identifying corresponding chanteur according to data are sung.
The above-mentioned chanteur's identification model trained can be based on decision tree, Logic Regression Models, deep neural network etc.
Model construction.In the training process, data iteration adjustment can be sung based on the sample for having marked corresponding chanteur to wait instructing
The parameter of experienced chanteur's identification model, to correct chanteur's identification model to the recognition result of chanteur.
Specifically, during training chanteur's identification model, available sample sings data, and sample sings number
According to can be the music data that when cappela (i.e.) generates when chanteur sings opera arias.Sample can be sung data input it is to be trained
Chanteur's identification model identifies, obtains the chanteur that chanteur's identification model to be trained sings data to sample and identifies knot
Then chanteur's identification model to be trained is sung chanteur's recognition result of data to sample and the sample marked is sung by fruit
It sings the corresponding chanteur of data to be compared, according to difference iteration adjustment between the two chanteur's identification model to be trained
Parameter so that chanteur's identification model to be trained after adjusting parameter to sample sing chanteur's recognition results of data with
The sample of mark sings the diminution of the difference between the corresponding chanteur of data.Above-mentioned adjusting parameter is repeated, adjustment ginseng is compared
Chanteur's recognition result that chanteur's identification model to be trained after number sings data to sample is sung with the sample marked
Difference between the corresponding chanteur of data, then according to difference continue adjusting parameter process so that after adjusting parameter to
Trained chanteur's identification model sings chanteur's recognition results of data to sample, and with the sample marked to sing data corresponding
Chanteur meet the preset condition of convergence or the number of iterations reaches preset number, at this moment can have been instructed with deconditioning
Experienced chanteur's identification model.
It, can be based on the Speaker Identification model construction song obtained in some optional implementations of the present embodiment
The person's of singing identification model.Speaker Identification model can be model trained, speaker's identity for identification.It can will speak
People's identification model is trained as initial chanteur's identification model to be trained.In this way, due to initial song to be trained
There is the person's of singing identification model resolution not have to tone color, the different speakers of articulation type and the ability of chanteur, can accelerate
The training speed of chanteur's identification model.
Referring back to Fig. 1, an illustrative scene of the above embodiments of the present application are as follows: user 110 hears a song
Afterwards, singer is initiated to server 105 by terminal device 101,102,103 and identifies request.The available terminal of server 105 is set
The audio data of standby 101,102,103 songs uploaded, then saved using server local or server institute is in the cluster
The voice disjunctive model of storage trained handles music data to be identified, by the voice data separating in song
It deals, then save the voice data isolated input server local or server stores in the cluster
Trained chanteur's identification model identifies, obtains the recognition result of the singer of song.Then, server 105 can incite somebody to action
Recognition result feeds back to user 110 by terminal device 101,102,103.
Chanteur's recognition methods of the above embodiments of the present application, by by the voice data and accompaniment data in music data
Separation carries out chanteur's identification using isolated voice data, is able to ascend the accuracy of chanteur's identification.
With continued reference to Fig. 3, it illustrates the method streams according to another embodiment of chanteur's recognition methods of the application
Journey schematic diagram.The process 300 of chanteur's recognition methods, comprising the following steps:
Step 301, the voice disjunctive model trained is obtained based on the training of first sample music data.
In the present embodiment, the available first sample music data of the executing subject of chanteur's recognition methods, and be based on
Trained voice disjunctive model is treated with first sample music data to be trained.
Specifically, first sample music data, which can be, sings data and accompaniment data synthesizes by corresponding.Wherein,
Accompaniment data may include, piano, guitar, drum.The voice data of the musical instruments such as bass.It can be using isolated software application of accompanying
First sample music data is handled, to extract voice data from first sample music data.Such as accompaniment separation
Software application the voice data in first sample music data can be eliminated, accompaniment data is obtained, then according to the first sample
The difference of this music data and corresponding accompaniment data obtains the voice data in first sample music data.It can will be based on answering
The voice data for using software to isolate as first sample music data voice separating resulting markup information, treat trained
Voice disjunctive model is trained, and optimizes voice splitting die to be trained by the parameter of iteration adjustment voice disjunctive model
Type., can be with deconditioning when the voice disjunctive model wait train reaches preset optimizing index, the people that has been trained
Sound disjunctive model.
It, can be based on first sample music data as follows in some optional implementations of the present embodiment
Training obtains the voice disjunctive model trained: extracting the spectrum signature of first sample music data, and is based on the first sample
The spectrum signature of this music data isolates sample voice data from first sample music data;Based on gauss hybrid models structure
Voice disjunctive model to be trained is built, using sample voice data as voice disjunctive model to be trained to first sample music number
According to the expected result for carrying out the voice data in the isolated first sample music data of voice, training obtains the voice trained
Disjunctive model.
Since human body is different with the principle of sound of musical instrument, usual voice data and accompaniment data have different frequency spectrums special
Sign, such as voice data and the corresponding frequency-region signal of accompaniment data have different amplitude and energy feature.It can be according to statistics
It learns the spectrum signature of voice data and accompaniment data counted to be separated, or by the way of based on deep neural network pair
The spectrum signature of voice data and accompaniment data is learnt, and the frequency spectrum for allowing deep neural network to differentiate voice data is special
It seeks peace the spectrum signature of accompaniment data.The time-domain signal of first sample music data can specifically be converted to frequency domain, then
The spectrum signature of first sample music data is extracted in frequency domain, later according to the different spectral of voice data and accompaniment data spy
Sign, the voice data separating in first sample music data is come out, sample voice data are obtained, which can be with
Annotation results as the corresponding voice data of first sample music data.
Voice separation to be trained can be constructed based on gauss hybrid models (Gaussian Mixture Model, GMM)
Model.It is then possible to using sample voice data as the expected result to the voice data separating in first sample music data,
First sample music data is inputted to voice disjunctive model to be trained, it should be wait instruct using the machine learning mode training for having supervision
Experienced voice disjunctive model.It can will specifically compare what voice disjunctive model to be trained isolated first sample music data
Voice data and the sample voice data isolated by spectrum signature, if difference between the two is unsatisfactory for preset item
Part, then can be with the parameter of iteration adjustment gauss hybrid models, so that based on voice data separating model to be trained to the first sample
Difference between voice data in the voice separating resulting of this music data and sample voice data reduces.In the difference of the two
Stop adjusting parameter when meeting preset condition, the voice disjunctive model trained.
By the sample voice data that will be extracted from first sample music data based on spectrum signature as wait train
Voice disjunctive model to the expected result of the voice data separating in first sample music data, voice separation that training obtains
Model can accurately extract the voice data in music data.
It, can be based on first sample music data according to such as lower section in other optional implementations of the present embodiment
Formula training obtains the voice disjunctive model trained: extracting the spectrum signature of first sample music data, is based on the first sample
The frequecy characteristic of this music data will be decomposed into sample voice data and sample accompaniment data from sample music data;Based on Gauss
Mixed model constructs voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first
Sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and by sample
Accompaniment data carries out the isolated first sample sound of voice to first sample music data as voice disjunctive model to be trained
The expected result of accompaniment data in happy data, training obtain the voice disjunctive model trained.
Specifically, can be according to the different spectral feature of voice data and accompaniment data, it will be in first sample music data
Voice data and accompaniment data separate, obtain sample voice data and sample accompaniment data.The sample voice data can
Using the annotation results as the corresponding voice data of first sample music data, sample accompaniment data then can be used as first sample
The annotation results of the corresponding accompaniment data of music data.Separation based on spectrum signature to voice data and accompaniment data herein
It can be using as, based on statistical method or based on the method for deep neural network, no longer gone to live in the household of one's in-laws on getting married in aforementioned implementation herein
It states.
Voice separation to be trained can be constructed based on gauss hybrid models (Gaussian Mixture Model, GMM)
Model.It is then possible to using sample voice data as the expected result to the voice data separating in first sample music data,
And using sample accompaniment data as the expected result of the accompaniment data separation in corresponding sample music data, by first sample music
Data input voice disjunctive model to be trained, and train the voice splitting die to be trained using the machine learning mode for having supervision
Type.Specifically can by the voice data for comparing voice disjunctive model to be trained first sample music data being isolated with pass through
The sample voice data that spectrum signature is isolated obtain the first comparison result, and compare voice disjunctive model to be trained to first
The accompaniment data that sample music data is isolated obtains second with the sample accompaniment data isolated by spectrum signature and compares knot
Fruit, if the preset condition of convergence or the first comparison result and is not satisfied in the first comparison result and the second comparison result
The summation of two comparison results is unsatisfactory for the preset condition of convergence, then can be with the parameter of iteration adjustment gauss hybrid models, so that base
Voice data and sample in the voice separating resulting of voice data separating model to be trained to first sample music data
Difference between voice data reduces, and/or based on voice data separating model to be trained to first sample music data
Difference between accompaniment data in voice separating resulting and sample accompaniment data reduces.It is compared in the first comparison result and second
As a result the summation for being all satisfied the preset condition of convergence or the first comparison result and the second comparison result meets preset convergence item
Stop adjusting parameter when part, the voice disjunctive model trained.
Above-mentioned implementation is separated using sample voice data and sample accompaniment data as voice in the training process
Model trains the people obtained to the voice separating resulting of first sample music data and the expected result of accompaniment separating resulting in this way
Sound disjunctive model can preferably separate voice data and accompaniment data, guarantee that the voice data isolated and accompaniment data all have
There is higher fidelity.
Step 302, the voice disjunctive model that use has been trained handles music data to be identified, obtains to be identified
Music data in voice data.
In the present embodiment, available music data to be identified.Herein, music data to be identified can be by
Sing the music data of data and accompaniment data synthesis.The music data to be identified can be the audio source document of a song
Part, or can be the audio data for recording generation in songs playing process by the electronic equipment with microphone.
After getting music data to be identified, music data input step 301 to be identified can be obtained
The voice disjunctive model trained, isolates voice data from music to be identified, as singing for singer to be identified
Data.
Step 303, the voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained
Chanteur's recognition result of music data to be identified.
Chanteur's identification model that the voice data input that step 302 obtains has been trained can be subjected to chanteur's identification.
The chanteur's identification model trained can be the model for identifying corresponding chanteur according to data are sung.
The above-mentioned chanteur's identification model trained can be based on decision tree, Logic Regression Models, deep neural network etc.
Model construction.In the training process, data iteration adjustment can be sung based on the sample for having marked corresponding chanteur to wait training
Chanteur's identification model parameter, to correct chanteur's identification model to the recognition result of chanteur.
Step 302, step 303 in chanteur's recognition methods of the present embodiment respectively with the step 201 of previous embodiment,
Step 202 is consistent, and the description above with respect to step 201, step 202 is also applied for step 302, step 303, no longer superfluous herein
It states.
The method flow 300 of chanteur's identification of the present embodiment, trained based on first sample music data by increase
The step of voice disjunctive model trained out, can obtain the voice point for the voice data being more suitable in separating music data
From model, so as to promote the accuracy of chanteur's identification.
In above-mentioned combination Fig. 2 and some optional implementations of Fig. 3 described embodiment, above-mentioned chanteur's identification
Method further includes showing that has trained sings based on the second sample music data training with corresponding chanteur's markup information
The step of person's identification model.The step can execute before step 202 and before step 303.It should be corresponding based on having
The step of the second sample music data training of chanteur's markup information obtains the chanteur's identification model trained, it is specific to wrap
It includes: the second sample music data being inputted to the voice disjunctive model trained, obtains the voice number in the second sample music data
According to;Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the voice number in the second sample music data
According to using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to the second sample music
The expected result of chanteur's identification of voice data in data, treats trained chanteur's model and is trained, instructed
Experienced chanteur's identification model.
Specifically, available second sample music data, the second sample music data can be with corresponding song
The music data of the person's of singing markup information can collect some songs in practice, and the artist information for obtaining these songs comes
Form the second sample music data.
Chanteur's identification model to be trained can be constructed based on gauss hybrid models, chanteur that should be to be trained identifies mould
Type can be the model for classification, and the corresponding chanteur's markup information of the second sample music data is that chanteur to be trained knows
The expected result that other model classifies to the second sample music data.In the training process, chanteur's identification model to be trained can
Feature when learning different chanteur's sounding, sing, by the parameter of iteration adjustment chanteur's identification model to be trained come
So that chanteur's identification model to be trained is to the classification results of the second sample music data and corresponding chanteur's markup information
Between difference be gradually reduced, chanteur's identification model to be trained to the classification results of the second sample music data with it is corresponding
Chanteur's markup information between difference when meeting preset difference condition, adjusting parameter can be stopped, completing training.
By the chanteur's identification model trained based on the second sample music data for having marked chanteur,
Obtained chanteur's identification model can preferably learn the tune of different chanteurs out, sing the difference between habit,
So as to promote the accuracy of chanteur's identification model.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of chanteur identifications
One embodiment of device, the Installation practice is corresponding with Fig. 2 and embodiment of the method shown in Fig. 3, which specifically can be with
Applied in various electronic equipments.
As shown in figure 4, chanteur's identification device 400 of the present embodiment includes separative unit 401 and recognition unit 402.Its
In, separative unit 401, which can be configured as, is handled music data to be identified using the voice disjunctive model trained,
Obtain the voice data in music data to be identified;Recognition unit 402 can be configured as will be in music data to be identified
Voice data input chanteur's identification model for having trained, obtain chanteur's recognition result of music data to be identified.
In some embodiments, above-mentioned apparatus 400 can also include: the first training unit, be configured as based on the first sample
The training of this music data obtains the voice disjunctive model trained.
In some embodiments, above-mentioned first training unit can be configured to be based on first sample music number
According to training obtains the voice disjunctive model trained as follows: the spectrum signature of first sample music data is extracted, and
Spectrum signature based on first sample music data isolates sample voice data from first sample music data;Based on Gauss
Mixed model constructs voice disjunctive model to be trained, using sample voice data as voice disjunctive model to be trained to first
Sample music data carries out the expected result of the voice data in the isolated first sample music data of voice, and training obtains
Trained voice disjunctive model.
In some embodiments, above-mentioned first training unit can be configured to be based on first sample music number
According to training obtains the voice disjunctive model trained as follows: extracting the spectrum signature of first sample music data, base
Sample voice data and sample accompaniment data will be decomposed into from sample music data in the frequecy characteristic of first sample music data;
Voice disjunctive model to be trained is constructed based on gauss hybrid models, using sample voice data as voice splitting die to be trained
Type carries out the expected result of the voice data in the isolated first sample music data of voice to first sample music data, and
And voice isolated the is carried out to first sample music data using sample accompaniment data as voice disjunctive model to be trained
The expected result of accompaniment data in one sample music data, training obtain the voice disjunctive model trained.
In some embodiments, above-mentioned apparatus 400 can also include: the second training unit, be configured as based on have pair
Second sample music data of the chanteur's markup information answered, training show that the chanteur trained identifies mould as follows
Type: the second sample music data is inputted to the voice disjunctive model trained, obtains the voice number in the second sample music data
According to;Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the voice number in the second sample music data
According to using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to the second sample music
The expected result of chanteur's identification of voice data in data, treats trained chanteur's model and is trained, instructed
Experienced chanteur's identification model.
It should be appreciated that all units recorded in device 400 and each step phase in the method described referring to figs. 2 and 3
It is corresponding.It is equally applicable to device 400 and unit wherein included above with respect to the operation and feature of method description as a result, herein
It repeats no more.
Chanteur's identification device 400 of the above embodiments of the present application, by will be to using the voice disjunctive model trained
Voice data separating in the music data of identification comes out, and only obtains chanteur's identification model that the input of voice data has been trained
Chanteur's recognition result reduces the influence that accompaniment data identifies chanteur, is able to ascend the accuracy of chanteur's identification.
Below with reference to Fig. 5, it illustrates the computer systems 500 for the electronic equipment for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.Electronic equipment shown in Fig. 5 is only an example, function to the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and
Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data.
CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always
Line 504.
I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 505 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 508 including hard disk etc.;
And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon
Computer program be mounted into storage section 508 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 509, and/or from detachable media
511 are mounted.When the computer program is executed by central processing unit (CPU) 501, limited in execution the present processes
Above-mentioned function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or calculating
Machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but it is unlimited
In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates
The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires
Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or
The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program
Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen
Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable
Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by
Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium
Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof
Machine program code, programming language include object oriented program language-such as Java, Smalltalk, C++, also
Including conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete
It executes, partly executed on the user computer on the user computer entirely, being executed as an independent software package, part
Part executes on the remote computer or executes on a remote computer or server completely on the user computer.It is relating to
And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively
Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service
Quotient is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include separative unit and recognition unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions
It is fixed, for example, separative unit is also described as " carrying out music data to be identified using the voice disjunctive model trained
Processing, obtains the unit of the voice data in music data to be identified ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should
Device: music data to be identified is handled using the voice disjunctive model trained, obtains music data to be identified
In voice data;Voice data in music data to be identified are inputted into chanteur's identification model for having trained, obtain to
Chanteur's recognition result of the music data of identification.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of chanteur's recognition methods, comprising:
Music data to be identified is handled using the voice disjunctive model trained, is obtained in music data to be identified
Voice data;
Voice data in music data to be identified are inputted to the chanteur's identification model trained, are obtained described to be identified
Chanteur's recognition result of music data.
2. according to the method described in claim 1, wherein, the method also includes:
The voice disjunctive model trained is obtained based on the training of first sample music data.
3. according to the method described in claim 2, wherein, described trained based on first sample music data obtains described trained
Voice disjunctive model, comprising:
Extract the spectrum signature of the first sample music data, and the spectrum signature based on the first sample music data from
Sample voice data are isolated in first sample music data;
Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train
Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data
Expected result, training obtains the voice disjunctive model trained.
4. according to the method described in claim 2, wherein, described trained based on first sample music data obtains described trained
Voice disjunctive model, comprising:
The spectrum signature of the first sample music data is extracted, the frequecy characteristic based on first sample music data will be from sample
Music data is decomposed into sample voice data and sample accompaniment data;
Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train
Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data
Expected result, and using the sample accompaniment data as the voice disjunctive model to be trained to first sample music number
According to the expected result for carrying out the accompaniment data in the isolated first sample music data of voice, training obtains described trained
Voice disjunctive model.
5. method according to claim 1-4, wherein the method also includes:
The chanteur trained is obtained based on the second sample music data training with corresponding chanteur's markup information
Identification model, comprising:
The second sample music data input voice disjunctive model trained is obtained in the second sample music data
Voice data;
Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the people in the second sample music data
Sound data, using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to second
The expected result of chanteur's identification of voice data in sample music data, instructs chanteur's model to be trained
Practice, obtains the chanteur's identification model trained.
6. a kind of chanteur's identification device, comprising:
Separative unit is configured as handling music data to be identified using the voice disjunctive model trained, be obtained
Voice data in music data to be identified;
Recognition unit is configured as the chanteur that the voice data input in music data to be identified has been trained identifying mould
Type obtains chanteur's recognition result of the music data to be identified.
7. device according to claim 6, wherein described device further include:
First training unit is configured as obtaining the voice splitting die trained based on the training of first sample music data
Type.
8. device according to claim 7, wherein first training unit is configured to based on first sample
Music data, training obtains the voice disjunctive model trained as follows:
Extract the spectrum signature of the first sample music data, and the spectrum signature based on the first sample music data from
Sample voice data are isolated in first sample music data;
Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train
Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data
Expected result, training obtains the voice disjunctive model trained.
9. device according to claim 7, wherein first training unit is configured to based on first sample
Music data, training obtains the voice disjunctive model trained as follows:
The spectrum signature of the first sample music data is extracted, the frequecy characteristic based on first sample music data will be from sample
Music data is decomposed into sample voice data and sample accompaniment data;
Voice disjunctive model to be trained is constructed based on gauss hybrid models, using the sample voice data as described wait train
Voice disjunctive model to first sample music data carry out the isolated first sample music data of voice in voice data
Expected result, and using the sample accompaniment data as the voice disjunctive model to be trained to first sample music number
According to the expected result for carrying out the accompaniment data in the isolated first sample music data of voice, training obtains described trained
Voice disjunctive model.
10. according to the described in any item devices of claim 6-9, wherein described device further include:
Second training unit, is configured as based on the second sample music data with corresponding chanteur's markup information, according to
The chanteur's identification model trained as described in obtaining under type training:
The second sample music data input voice disjunctive model trained is obtained in the second sample music data
Voice data;
Chanteur's identification model to be trained is constructed based on gauss hybrid models, utilizes the people in the second sample music data
Sound data, using chanteur's markup information of the second sample music data as chanteur's identification model to be trained to second
The expected result of chanteur's identification of voice data in sample music data, instructs chanteur's model to be trained
Practice, obtains the chanteur's identification model trained.
11. a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor
Now such as method as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148198.2A CN109308901A (en) | 2018-09-29 | 2018-09-29 | Chanteur's recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148198.2A CN109308901A (en) | 2018-09-29 | 2018-09-29 | Chanteur's recognition methods and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109308901A true CN109308901A (en) | 2019-02-05 |
Family
ID=65225167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811148198.2A Pending CN109308901A (en) | 2018-09-29 | 2018-09-29 | Chanteur's recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109308901A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110085251A (en) * | 2019-04-26 | 2019-08-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Voice extracting method, voice extraction element and Related product |
CN110853618A (en) * | 2019-11-19 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Language identification method, model training method, device and equipment |
WO2020228226A1 (en) * | 2019-05-14 | 2020-11-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Instrumental music detection method and apparatus, and storage medium |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112270929A (en) * | 2020-11-18 | 2021-01-26 | 上海依图网络科技有限公司 | Song identification method and device |
CN112466334A (en) * | 2020-12-14 | 2021-03-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, equipment and medium |
CN113284501A (en) * | 2021-05-18 | 2021-08-20 | 平安科技(深圳)有限公司 | Singer identification method, singer identification device, singer identification equipment and storage medium |
WO2024103302A1 (en) * | 2022-11-16 | 2024-05-23 | 广州酷狗计算机科技有限公司 | Human voice note recognition model training method, human voice note recognition method, and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310788A (en) * | 2013-05-23 | 2013-09-18 | 北京云知声信息技术有限公司 | Voice information identification method and system |
CN103943113A (en) * | 2014-04-15 | 2014-07-23 | 福建星网视易信息系统有限公司 | Method and device for removing accompaniment from song |
CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息系统有限公司 | Method and device for recommending music stars with tones similar to those of singers |
CN104464727A (en) * | 2014-12-11 | 2015-03-25 | 福州大学 | Single-channel music singing separation method based on deep belief network |
CN105575393A (en) * | 2015-12-02 | 2016-05-11 | 中国传媒大学 | Personalized song recommendation method based on voice timbre |
CN106024005A (en) * | 2016-07-01 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Processing method and apparatus for audio data |
CN106683680A (en) * | 2017-03-10 | 2017-05-17 | 百度在线网络技术(北京)有限公司 | Speaker recognition method and device and computer equipment and computer readable media |
-
2018
- 2018-09-29 CN CN201811148198.2A patent/CN109308901A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103310788A (en) * | 2013-05-23 | 2013-09-18 | 北京云知声信息技术有限公司 | Voice information identification method and system |
CN103943113A (en) * | 2014-04-15 | 2014-07-23 | 福建星网视易信息系统有限公司 | Method and device for removing accompaniment from song |
CN104183245A (en) * | 2014-09-04 | 2014-12-03 | 福建星网视易信息系统有限公司 | Method and device for recommending music stars with tones similar to those of singers |
CN104464727A (en) * | 2014-12-11 | 2015-03-25 | 福州大学 | Single-channel music singing separation method based on deep belief network |
CN105575393A (en) * | 2015-12-02 | 2016-05-11 | 中国传媒大学 | Personalized song recommendation method based on voice timbre |
CN106024005A (en) * | 2016-07-01 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Processing method and apparatus for audio data |
CN106683680A (en) * | 2017-03-10 | 2017-05-17 | 百度在线网络技术(北京)有限公司 | Speaker recognition method and device and computer equipment and computer readable media |
Non-Patent Citations (3)
Title |
---|
JIALIE SHEN等: "Towards Efficient Automated Singer Identification in Large Music Databases", 《INTERNATIONAL ACM CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》 * |
WEI-HO TSAI,HSIN-CHIEH LEE: "Singer Identification Based on Spoken Data in Voice Characterization", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
李伟,等: "理解数字音乐_音乐信息检索技术综述", 《复旦学报(自然科学版)》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110085251B (en) * | 2019-04-26 | 2021-06-25 | 腾讯音乐娱乐科技(深圳)有限公司 | Human voice extraction method, human voice extraction device and related products |
CN110085251A (en) * | 2019-04-26 | 2019-08-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Voice extracting method, voice extraction element and Related product |
WO2020228226A1 (en) * | 2019-05-14 | 2020-11-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Instrumental music detection method and apparatus, and storage medium |
CN110853618A (en) * | 2019-11-19 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Language identification method, model training method, device and equipment |
CN110853618B (en) * | 2019-11-19 | 2022-08-19 | 腾讯科技(深圳)有限公司 | Language identification method, model training method, device and equipment |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112201226B (en) * | 2020-09-28 | 2022-09-16 | 复旦大学 | Sound production mode judging method and system |
CN112270929A (en) * | 2020-11-18 | 2021-01-26 | 上海依图网络科技有限公司 | Song identification method and device |
CN112270929B (en) * | 2020-11-18 | 2024-03-22 | 上海依图网络科技有限公司 | Song identification method and device |
CN112466334A (en) * | 2020-12-14 | 2021-03-09 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, equipment and medium |
CN113284501A (en) * | 2021-05-18 | 2021-08-20 | 平安科技(深圳)有限公司 | Singer identification method, singer identification device, singer identification equipment and storage medium |
CN113284501B (en) * | 2021-05-18 | 2024-03-08 | 平安科技(深圳)有限公司 | Singer identification method, singer identification device, singer identification equipment and storage medium |
WO2024103302A1 (en) * | 2022-11-16 | 2024-05-23 | 广州酷狗计算机科技有限公司 | Human voice note recognition model training method, human voice note recognition method, and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109308901A (en) | Chanteur's recognition methods and device | |
US11017788B2 (en) | System and method for creating timbres | |
CN106898340B (en) | Song synthesis method and terminal | |
CN108737872A (en) | Method and apparatus for output information | |
CN111091800B (en) | Song generation method and device | |
WO2019109787A1 (en) | Audio classification method and apparatus, intelligent device, and storage medium | |
CN107623614A (en) | Method and apparatus for pushed information | |
CN111445892B (en) | Song generation method and device, readable medium and electronic equipment | |
CN108806665A (en) | Phoneme synthesizing method and device | |
CN107767869A (en) | Method and apparatus for providing voice service | |
CN109272984A (en) | Method and apparatus for interactive voice | |
CN111899720A (en) | Method, apparatus, device and medium for generating audio | |
CN109147800A (en) | Answer method and device | |
CN111798821B (en) | Sound conversion method, device, readable storage medium and electronic equipment | |
JP2015040903A (en) | Voice processor, voice processing method and program | |
CN111161695B (en) | Song generation method and device | |
CN110675886A (en) | Audio signal processing method, audio signal processing device, electronic equipment and storage medium | |
JP7497523B2 (en) | Method, device, electronic device and storage medium for synthesizing custom timbre singing voice | |
CN113691909A (en) | Digital audio workstation with audio processing recommendations | |
CN107680584A (en) | Method and apparatus for cutting audio | |
CN108804667A (en) | The method and apparatus of information for rendering | |
CN108829739A (en) | A kind of information-pushing method and device | |
CN116798405B (en) | Speech synthesis method, device, storage medium and electronic equipment | |
Tachibana et al. | A real-time audio-to-audio karaoke generation system for monaural recordings based on singing voice suppression and key conversion techniques | |
CN113744759B (en) | Tone color template customizing method and device, equipment, medium and product thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211013 Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190205 |
|
RJ01 | Rejection of invention patent application after publication |