CN110211594A - A kind of method for distinguishing speek person based on twin network model and KNN algorithm - Google Patents
A kind of method for distinguishing speek person based on twin network model and KNN algorithm Download PDFInfo
- Publication number
- CN110211594A CN110211594A CN201910494606.8A CN201910494606A CN110211594A CN 110211594 A CN110211594 A CN 110211594A CN 201910494606 A CN201910494606 A CN 201910494606A CN 110211594 A CN110211594 A CN 110211594A
- Authority
- CN
- China
- Prior art keywords
- speaker
- network model
- voice
- voice signal
- knn algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 239000013598 vector Substances 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000013480 data collection Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000009795 derivation Methods 0.000 claims 1
- 239000000284 extract Substances 0.000 claims 1
- 230000005236 sound signal Effects 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Step S1: the invention discloses a kind of method for distinguishing speek person based on twin network model and KNN algorithm uses the voice messaging of microphone acquisition speaker as the trained RNN network model of data set;Step S2: speaker is identified using the twin network model of trained RNN network struction and in conjunction with KNN algorithm.Using technical solution of the present invention, the data set of speaker in database is trained, determine that the input energy output for each voice signal being input in twin network indicates the feature of the speaker, the distance between different output feature vectors are calculated with COS distance and judge whether to belong to same speaker with KNN algorithm, so that a small amount of samples can also identify speaker, and as the increase of speaker's quantity does not need again to be trained network, reduce requirement of the neural network to data sample size, the real-time and accuracy of Speaker Identification are effectively increased simultaneously.
Description
Technical field
The invention belongs to human-computer interaction technique fields, especially speaker Recognition Technology field, specifically design one kind and are based on
The method for distinguishing speek person of twin network model and KNN algorithm.
Background technique
In field of human-computer interaction, with the rapid development of the technologies such as artificial intelligence, pattern-recognition, between people and computer
Interaction it is more and more closer, traditional contact interactive mode is no longer satisfied the demand of people, study it is novel, meet people's
Exchange the written research hotspot in recent years of interactive mode of habit.Speaker Identification as human-computer interaction main thoroughfare it
One, gradually become the important research topic in interactive field.
The existing method for Speaker Identification mainly includes the method for speech feature extraction and template matching technique, voice
The method of the method and depth learning technology of statistical model.Wherein the main research work concern of conventional model is phonetic feature
Extraction and template matching technique.Method based on template matching is the vocal print sample for training identification in advance, by sound to be identified
Line is matched, and this method is easy to operate, but accuracy of identification is not high and needs a large amount of data sample.It is united based on voice
In the method for counting model, identification mission is defined as to calculate the probability of variable, this method accuracy of identification is high, but needs a large amount of
Data are verified.In method based on depth learning technology, feature hiding inside crawl speaker is gone using neural network
It can preferably indicate speaker, this method not only needs mass data but also requires in more new data set each time
Re -training is carried out to neural network, is unfavorable for new data input.
Summary of the invention
The deficiency of a large amount of speech samples is required to for the prior art, it is an object of that present invention to provide one kind to be based on twin net
The method for distinguishing speek person of network model and KNN algorithm.The voice messaging that speaker is acquired by microphone apparatus, devises one kind
Strategy is adjusted to speaker information update in conjunction with twin RNN network and KNN algorithm, is spoken so that small amount of data is just able to achieve
People's identification and identification are more rapidly and efficiently.Specific technical solution is as follows:
A kind of method for distinguishing speek person classified based on twin network model and KNN, comprising the following steps:
Step S1: use the voice messaging of microphone acquisition speaker as data set training RNN network model;
Step S2: using the twin network model of trained RNN network struction and KNN algorithm is combined to carry out speaker
Identification;
Wherein, the step S1 further comprises:
Step S11: a large amount of voice data collection of acquisition, line number of going forward side by side Data preprocess;
Step S12: pretreated voice data collection is stored in speech database;
Step S13: voice signal data collection, the rule changed over time based on voice signal are obtained from speech database
Rule obtains the feature vector of voice signal using feature extracting method;
Step S14: it according to the phonic signal character vector v extracted in step S13, is calculated using time-based backpropagation
Method BPTT trains RNN model, obtains optimal parameter Θ and archetype;
The step S13 further comprises:
Step S131: the set in a period of time t that x is one section of voice is set, framing is carried out to it, frame length 25s is obtained
Discrete voice signal x about time t1,x2…xt;
Step S132: input X={ x is set in step S1311,x2…xtMFCC is combined to carry out spy to the discrete signal
Sign is extracted, and the speech feature vector V={ v of 40 dimensions is extracted1,v2…vt}
The step S14 further comprises:
Step S141: for be on each its time of voice signal it is associated, the t moment of that section of voice signal
Input just contains vt, the voice status s of last moment can be remembered in RNN network modelt-1, the hidden layer h at each momenttAll
Related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), herein
Activation primitive f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output of one section of voice to
Measure F;
Step S144: output vector F={ f is set1,f2…ftBeIf the shared parameter Θ of RNN network model=W, U,
V }, according between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss function{ W, U, V } is asked respectively using back-propagation algorithm
It leads to obtain optimal parameter Θ and archetype;
The step S2 further comprises:
Step S21: joined using the twin network model of trained RNN network of network model construction and shared consolidated network
Number, inputs multiple and different voice signal X respectively0…Xn, predict the output vector result set FS=of its voice signal
{F0,···,Fn};
Step S22: the output vector FS according to obtained in previous step, calculate it is different output feature vectors between cosine away from
From, and KNN algorithm is used, to obtain whether the voice belongs to same people;
The step S22 further comprises:
Step S221: output vector F will be obtained by the voice signal of twin network of network model0,F1,…,Fn, use
The vector classified carries out weight expression, wherein F1={ f11,f12…f1tAnd Fn={ fn1,fn2…fntIndicate speaker's sample
The voice signal of concentration, F0={ f01,f02…f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers is not
The length for being embodied in two vectors is only related with the angle of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is being found with KNN algorithm
Distance F0Nearest point, is expressed as same speaker.
Compared with prior art, the invention has the benefit that
1, the present invention devises a kind of method for distinguishing speek person for Speaker Identification, to each speaker's classification, only
One or a small amount of training sample is provided, and sample has mobility again.The output category model of not direct training pattern but instruct
Practice the similarity function of model, a small amount of sample made also can precisely quickly identify speaker.
2, one section of continuous voice signal is decomposed into discrete speech signal vector by the present invention, is known in traditional speaker
It is required to the voice signal of input equal length in not, and can be inputted in the present invention with arbitrary voice signal length,
Improve the terseness used.
3, the network proposed in the present invention is on the basis of traditional twin network, by the binary channels input expanding of twin network
It is inputted to multichannel.More quickly speaker can be identified.
4, the present invention devises a kind of according to similarity between same speaker and different speaker, according to KNN algorithm come
Judge the similarity between different speakers and judges whether to belong to same speaker.
Detailed description of the invention
Fig. 1 is a kind of frame of method for distinguishing speek person classified based on twin network model and KNN provided by the invention
Flow chart;
Fig. 2 is a kind of voice of the method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention
The detail flowchart of feature extraction;
Fig. 3 is a kind of depth of the method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention
Recognition with Recurrent Neural Network structure;
Fig. 4 is to construct in a kind of method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention
Twin network structure
Fig. 5 is twin in a kind of method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention
The detail flowchart of network and KNN algorithm;
Specific embodiment
Technical solution provided by the invention is described further below with reference to attached drawing.
In real life, with personnel increase and leave away, it is desirable to realization speaker is identified according to voice, need
Its voice signal to be added when increasing personnel, as new voice addition just needs again to carry out existing model
Re -training does not utilize update in this way.The present invention propose based on twin network model make only need by increase newly into voice believe
Number be added network, differentiated using the similarity that twin network obtains itself and voice to be measured.
The present invention provides a kind of system based on twin network model and KNN classification Speaker Identification is as shown in Figure 1.It is whole
For body, the present invention includes two big steps: step S1: using the voice messaging of microphone acquisition speaker as data set training
RNN network model;Step S2: using the twin network model of trained RNN network struction and combine KNN algorithm to speaker
It is identified;
Shown in Figure 2, a large amount of voice data collection that will acquire are gone forward side by side line number Data preprocess, by voice signal obtained
It carries out preemphasis, framing and carries out Fourier transformation and obtain the speech feature vector v of 40 dimensions;
RNN network model shown in Figure 3, by it is obtained 40 dimension speech feature vector input RNN network model into
Row training simultaneously obtains initial model.
Step S141: for be on each its time of voice signal it is associated, the t moment of that section of voice signal
Input just contains vt, the voice status s of last moment can be remembered in RNN network modelt-1, the hidden layer h at each momenttAll
Related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), herein
Activation primitive f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output of one section of voice to
Measure F;
Step S144: output vector F={ f is set1,f2…ftBeIf the shared parameter Θ of RNN network model=W, U,
V }, according between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss function{ W, U, V } is asked respectively using back-propagation algorithm
It leads to obtain optimal parameter Θ and archetype;
With twin network structure shown in Fig. 4, the dual input of twin network is somebody's turn to do as multi input, every time when test
N sections of voice signals are inputted, wherein there is one section of voice signal and n-1 section reference voice sample signal to be measured as shown in Figure 5.Circulation mind
The feature vector for extracting voice signal to be measured as shown in Figure 3 through network (RNN) is spent respectively with the feature vector of reference speech signal
Span is from incorporating that nearest a kind of label of space length into for be measured, speak to realize later by KNN nearest neighbor algorithm
People's identification.
Step S221: output vector F will be obtained by the voice signal of twin network of network model0,F1,…,Fn, use
The vector classified carries out weight expression, wherein F1={ f11,f12…f1tAnd Fn={ fn1,fn2…fntIndicate speaker's sample
The voice signal of concentration, F0={ f01,f02…f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers is not
The length for being embodied in two vectors is only related with the angle of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is being found with KNN algorithm
Distance F0Nearest point, is expressed as same speaker.
Claims (4)
1. a kind of method for distinguishing speek person based on twin network model and KNN algorithm, which comprises the following steps:
Step S1: use the voice messaging of microphone acquisition speaker as data set training RNN network model;
Step S2: speaker is known using the twin network model of trained RNN network struction and in conjunction with KNN algorithm
Not;
Wherein, the step S1 is specific as follows:
Step S11: a large amount of voice data collection of acquisition, line number of going forward side by side Data preprocess;
Step S12: pretreated voice data collection is stored in speech database;
Step S13: obtain voice signal data collection from speech database is made based on the rule that voice signal changes over time
The feature vector v of voice signal is obtained with feature extracting method;
Step S14: according to the phonic signal character vector v extracted in step S13, time-based back-propagation algorithm is used
BPTT trains RNN model, obtains optimal parameter Θ and archetype;
The step S2 is specific as follows:
Step S21: the trained twin network model of RNN network of network model construction is used, inputs multiple and different languages respectively
Sound signal X0…Xn, predict the output vector result set FS={ F of its voice signal0..., Fn};
Step S22: the output vector FS according to obtained in previous step, the COS distance between different output feature vectors is calculated,
And KNN algorithm is used, to obtain whether the voice belongs to same people.
2. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S13 is specific as follows:
Step S131: setting the set in a period of time t that x is one section of voice, carries out framing to it, frame length 25s, obtain about
The discrete voice signal X=x of time t1, x2...xt;
Step S132: input X={ x is set in step S1311, x2...xtMFCC is combined to propose discrete signal progress feature
It takes, extracts the speech feature vector V={ v of 40 dimensions1, v2...vt}。
3. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S14 is specific as follows:
Step S141: for being associated, the input of the t moment of that section of voice signal on each its time of voice signal
Just contain vtFeature vector, the voice status s of last moment can be remembered in RNN network modelt-1, each moment it is hidden
Hide layer htAll related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1;
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), it activates herein
Function f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output vector F of one section of voice;
Step S144: output vector F={ f is set1, f2...ftBeIf shared parameter Θ={ W, U, the V } of RNN network model,
According between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss functionUsing back-propagation algorithm to { W, U, V } respectively carry out derivation to
Obtain optimal parameter Θ and archetype.
4. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S22 is specific as follows:
Step S221: output vector F will be obtained by the voice signal of twin RNN network model0, F1..., Fn, use classification
Good vector carries out weight expression, wherein F1={ f11, f12...f1tAnd Fn={ fn1, fn2...fntIndicate speaker's sample set
In voice signal, F0={ f01, f02...f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers does not embody
Only related with the angle of two vectors in the length of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is finding distance with KNN algorithm
F0Nearest point, is expressed as same speaker.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910494606.8A CN110211594B (en) | 2019-06-06 | 2019-06-06 | Speaker identification method based on twin network model and KNN algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910494606.8A CN110211594B (en) | 2019-06-06 | 2019-06-06 | Speaker identification method based on twin network model and KNN algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110211594A true CN110211594A (en) | 2019-09-06 |
CN110211594B CN110211594B (en) | 2021-05-04 |
Family
ID=67791537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910494606.8A Active CN110211594B (en) | 2019-06-06 | 2019-06-06 | Speaker identification method based on twin network model and KNN algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110211594B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569908A (en) * | 2019-09-10 | 2019-12-13 | 苏州思必驰信息科技有限公司 | Speaker counting method and system |
CN110767239A (en) * | 2019-09-20 | 2020-02-07 | 平安科技(深圳)有限公司 | Voiceprint recognition method, device and equipment based on deep learning |
CN111048097A (en) * | 2019-12-19 | 2020-04-21 | 中国人民解放军空军研究院通信与导航研究所 | Twin network voiceprint recognition method based on 3D convolution |
CN111126563A (en) * | 2019-11-25 | 2020-05-08 | 中国科学院计算技术研究所 | Twin network-based space-time data target identification method and system |
CN111785287A (en) * | 2020-07-06 | 2020-10-16 | 北京世纪好未来教育科技有限公司 | Speaker recognition method, speaker recognition device, electronic equipment and storage medium |
CN112270931A (en) * | 2020-10-22 | 2021-01-26 | 江西师范大学 | Method for carrying out deceptive voice detection based on twin convolutional neural network |
CN113903043A (en) * | 2021-12-11 | 2022-01-07 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107170445A (en) * | 2017-05-10 | 2017-09-15 | 重庆大学 | The parkinsonism detection means preferably differentiated is cooperateed with based on voice mixing information characteristics |
CN108492294A (en) * | 2018-03-23 | 2018-09-04 | 北京邮电大学 | A kind of appraisal procedure and device of image color harmony degree |
CN109065032A (en) * | 2018-07-16 | 2018-12-21 | 杭州电子科技大学 | A kind of external corpus audio recognition method based on depth convolutional neural networks |
CN109243467A (en) * | 2018-11-14 | 2019-01-18 | 龙马智声(珠海)科技有限公司 | Sound-groove model construction method, method for recognizing sound-groove and system |
US20190035431A1 (en) * | 2017-07-28 | 2019-01-31 | Adobe Systems Incorporated | Apparatus, systems, and methods for integrating digital media content |
CN109543009A (en) * | 2018-10-17 | 2019-03-29 | 龙马智芯(珠海横琴)科技有限公司 | Text similarity assessment system and text similarity appraisal procedure |
-
2019
- 2019-06-06 CN CN201910494606.8A patent/CN110211594B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107170445A (en) * | 2017-05-10 | 2017-09-15 | 重庆大学 | The parkinsonism detection means preferably differentiated is cooperateed with based on voice mixing information characteristics |
US20190035431A1 (en) * | 2017-07-28 | 2019-01-31 | Adobe Systems Incorporated | Apparatus, systems, and methods for integrating digital media content |
CN108492294A (en) * | 2018-03-23 | 2018-09-04 | 北京邮电大学 | A kind of appraisal procedure and device of image color harmony degree |
CN109065032A (en) * | 2018-07-16 | 2018-12-21 | 杭州电子科技大学 | A kind of external corpus audio recognition method based on depth convolutional neural networks |
CN109543009A (en) * | 2018-10-17 | 2019-03-29 | 龙马智芯(珠海横琴)科技有限公司 | Text similarity assessment system and text similarity appraisal procedure |
CN109243467A (en) * | 2018-11-14 | 2019-01-18 | 龙马智声(珠海)科技有限公司 | Sound-groove model construction method, method for recognizing sound-groove and system |
Non-Patent Citations (4)
Title |
---|
CHENG ZHANG: ""Siamese neural network based gait recognition for human identification"", 《ICASSP》 * |
YU Q: ""sketch-a-net that beats humans"", 《BRITISH MACHINE VISION CONFERENCE》 * |
丁美玉: ""基于时序特征的草图识别方法"", 《计算机科学》 * |
马月洁: ""基于深度学习的足球球员跟踪算法研究"", 《中国传媒大学学报自然科学版》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110569908A (en) * | 2019-09-10 | 2019-12-13 | 苏州思必驰信息科技有限公司 | Speaker counting method and system |
CN110569908B (en) * | 2019-09-10 | 2022-05-13 | 思必驰科技股份有限公司 | Speaker counting method and system |
CN110767239A (en) * | 2019-09-20 | 2020-02-07 | 平安科技(深圳)有限公司 | Voiceprint recognition method, device and equipment based on deep learning |
CN111126563A (en) * | 2019-11-25 | 2020-05-08 | 中国科学院计算技术研究所 | Twin network-based space-time data target identification method and system |
CN111126563B (en) * | 2019-11-25 | 2023-09-29 | 中国科学院计算技术研究所 | Target identification method and system based on space-time data of twin network |
CN111048097A (en) * | 2019-12-19 | 2020-04-21 | 中国人民解放军空军研究院通信与导航研究所 | Twin network voiceprint recognition method based on 3D convolution |
CN111785287A (en) * | 2020-07-06 | 2020-10-16 | 北京世纪好未来教育科技有限公司 | Speaker recognition method, speaker recognition device, electronic equipment and storage medium |
WO2022007766A1 (en) * | 2020-07-06 | 2022-01-13 | 北京世纪好未来教育科技有限公司 | Speaker recognition method and apparatus, electronic device, and storage medium |
CN111785287B (en) * | 2020-07-06 | 2022-06-07 | 北京世纪好未来教育科技有限公司 | Speaker recognition method, speaker recognition device, electronic equipment and storage medium |
US11676609B2 (en) | 2020-07-06 | 2023-06-13 | Beijing Century Tal Education Technology Co. Ltd. | Speaker recognition method, electronic device, and storage medium |
CN112270931A (en) * | 2020-10-22 | 2021-01-26 | 江西师范大学 | Method for carrying out deceptive voice detection based on twin convolutional neural network |
CN113903043A (en) * | 2021-12-11 | 2022-01-07 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
CN113903043B (en) * | 2021-12-11 | 2022-05-06 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
Also Published As
Publication number | Publication date |
---|---|
CN110211594B (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211594A (en) | A kind of method for distinguishing speek person based on twin network model and KNN algorithm | |
CN109409296B (en) | Video emotion recognition method integrating facial expression recognition and voice emotion recognition | |
CN106228977B (en) | Multi-mode fusion song emotion recognition method based on deep learning | |
CN112784798B (en) | Multi-modal emotion recognition method based on feature-time attention mechanism | |
US11862145B2 (en) | Deep hierarchical fusion for machine intelligence applications | |
CN108777140A (en) | Phonetics transfer method based on VAE under a kind of training of non-parallel corpus | |
CN110634491A (en) | Series connection feature extraction system and method for general voice task in voice signal | |
CN108269133A (en) | A kind of combination human bioequivalence and the intelligent advertisement push method and terminal of speech recognition | |
CN107492382A (en) | Voiceprint extracting method and device based on neutral net | |
CN103198833B (en) | A kind of high precision method for identifying speaker | |
CN112581979A (en) | Speech emotion recognition method based on spectrogram | |
CN110992988B (en) | Speech emotion recognition method and device based on domain confrontation | |
CN111178157A (en) | Chinese lip language identification method from cascade sequence to sequence model based on tone | |
CN110289002A (en) | A kind of speaker clustering method and system end to end | |
Sarkar et al. | Time-contrastive learning based deep bottleneck features for text-dependent speaker verification | |
CN110211595A (en) | A kind of speaker clustering system based on deep learning | |
CN111128178A (en) | Voice recognition method based on facial expression analysis | |
CN104464738B (en) | A kind of method for recognizing sound-groove towards Intelligent mobile equipment | |
CN114898779A (en) | Multi-mode fused speech emotion recognition method and system | |
CN103258536B (en) | A kind of extensive speaker's identification method | |
CN110348482A (en) | A kind of speech emotion recognition system based on depth model integrated architecture | |
CN114360584A (en) | Phoneme-level-based speech emotion layered recognition method and system | |
CN116434786A (en) | Text-semantic-assisted teacher voice emotion recognition method | |
Trabelsi et al. | A multi level data fusion approach for speaker identification on telephone speech | |
Shi et al. | Construction of english pronunciation judgment and detection model based on deep learning neural networks data stream fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |