CN110211594A - A kind of method for distinguishing speek person based on twin network model and KNN algorithm - Google Patents

A kind of method for distinguishing speek person based on twin network model and KNN algorithm Download PDF

Info

Publication number
CN110211594A
CN110211594A CN201910494606.8A CN201910494606A CN110211594A CN 110211594 A CN110211594 A CN 110211594A CN 201910494606 A CN201910494606 A CN 201910494606A CN 110211594 A CN110211594 A CN 110211594A
Authority
CN
China
Prior art keywords
speaker
network model
voice
voice signal
knn algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910494606.8A
Other languages
Chinese (zh)
Other versions
CN110211594B (en
Inventor
张莉
李文钧
李竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Hangzhou Electronic Science and Technology University
Original Assignee
Hangzhou Electronic Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Electronic Science and Technology University filed Critical Hangzhou Electronic Science and Technology University
Priority to CN201910494606.8A priority Critical patent/CN110211594B/en
Publication of CN110211594A publication Critical patent/CN110211594A/en
Application granted granted Critical
Publication of CN110211594B publication Critical patent/CN110211594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Step S1: the invention discloses a kind of method for distinguishing speek person based on twin network model and KNN algorithm uses the voice messaging of microphone acquisition speaker as the trained RNN network model of data set;Step S2: speaker is identified using the twin network model of trained RNN network struction and in conjunction with KNN algorithm.Using technical solution of the present invention, the data set of speaker in database is trained, determine that the input energy output for each voice signal being input in twin network indicates the feature of the speaker, the distance between different output feature vectors are calculated with COS distance and judge whether to belong to same speaker with KNN algorithm, so that a small amount of samples can also identify speaker, and as the increase of speaker's quantity does not need again to be trained network, reduce requirement of the neural network to data sample size, the real-time and accuracy of Speaker Identification are effectively increased simultaneously.

Description

A kind of method for distinguishing speek person based on twin network model and KNN algorithm
Technical field
The invention belongs to human-computer interaction technique fields, especially speaker Recognition Technology field, specifically design one kind and are based on The method for distinguishing speek person of twin network model and KNN algorithm.
Background technique
In field of human-computer interaction, with the rapid development of the technologies such as artificial intelligence, pattern-recognition, between people and computer Interaction it is more and more closer, traditional contact interactive mode is no longer satisfied the demand of people, study it is novel, meet people's Exchange the written research hotspot in recent years of interactive mode of habit.Speaker Identification as human-computer interaction main thoroughfare it One, gradually become the important research topic in interactive field.
The existing method for Speaker Identification mainly includes the method for speech feature extraction and template matching technique, voice The method of the method and depth learning technology of statistical model.Wherein the main research work concern of conventional model is phonetic feature Extraction and template matching technique.Method based on template matching is the vocal print sample for training identification in advance, by sound to be identified Line is matched, and this method is easy to operate, but accuracy of identification is not high and needs a large amount of data sample.It is united based on voice In the method for counting model, identification mission is defined as to calculate the probability of variable, this method accuracy of identification is high, but needs a large amount of Data are verified.In method based on depth learning technology, feature hiding inside crawl speaker is gone using neural network It can preferably indicate speaker, this method not only needs mass data but also requires in more new data set each time Re -training is carried out to neural network, is unfavorable for new data input.
Summary of the invention
The deficiency of a large amount of speech samples is required to for the prior art, it is an object of that present invention to provide one kind to be based on twin net The method for distinguishing speek person of network model and KNN algorithm.The voice messaging that speaker is acquired by microphone apparatus, devises one kind Strategy is adjusted to speaker information update in conjunction with twin RNN network and KNN algorithm, is spoken so that small amount of data is just able to achieve People's identification and identification are more rapidly and efficiently.Specific technical solution is as follows:
A kind of method for distinguishing speek person classified based on twin network model and KNN, comprising the following steps:
Step S1: use the voice messaging of microphone acquisition speaker as data set training RNN network model;
Step S2: using the twin network model of trained RNN network struction and KNN algorithm is combined to carry out speaker Identification;
Wherein, the step S1 further comprises:
Step S11: a large amount of voice data collection of acquisition, line number of going forward side by side Data preprocess;
Step S12: pretreated voice data collection is stored in speech database;
Step S13: voice signal data collection, the rule changed over time based on voice signal are obtained from speech database Rule obtains the feature vector of voice signal using feature extracting method;
Step S14: it according to the phonic signal character vector v extracted in step S13, is calculated using time-based backpropagation Method BPTT trains RNN model, obtains optimal parameter Θ and archetype;
The step S13 further comprises:
Step S131: the set in a period of time t that x is one section of voice is set, framing is carried out to it, frame length 25s is obtained Discrete voice signal x about time t1,x2…xt
Step S132: input X={ x is set in step S1311,x2…xtMFCC is combined to carry out spy to the discrete signal Sign is extracted, and the speech feature vector V={ v of 40 dimensions is extracted1,v2…vt}
The step S14 further comprises:
Step S141: for be on each its time of voice signal it is associated, the t moment of that section of voice signal Input just contains vt, the voice status s of last moment can be remembered in RNN network modelt-1, the hidden layer h at each momenttAll Related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), herein Activation primitive f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output of one section of voice to Measure F;
Step S144: output vector F={ f is set1,f2…ftBeIf the shared parameter Θ of RNN network model=W, U, V }, according between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss function{ W, U, V } is asked respectively using back-propagation algorithm It leads to obtain optimal parameter Θ and archetype;
The step S2 further comprises:
Step S21: joined using the twin network model of trained RNN network of network model construction and shared consolidated network Number, inputs multiple and different voice signal X respectively0…Xn, predict the output vector result set FS=of its voice signal {F0,···,Fn};
Step S22: the output vector FS according to obtained in previous step, calculate it is different output feature vectors between cosine away from From, and KNN algorithm is used, to obtain whether the voice belongs to same people;
The step S22 further comprises:
Step S221: output vector F will be obtained by the voice signal of twin network of network model0,F1,…,Fn, use The vector classified carries out weight expression, wherein F1={ f11,f12…f1tAnd Fn={ fn1,fn2…fntIndicate speaker's sample The voice signal of concentration, F0={ f01,f02…f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers is not The length for being embodied in two vectors is only related with the angle of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is being found with KNN algorithm Distance F0Nearest point, is expressed as same speaker.
Compared with prior art, the invention has the benefit that
1, the present invention devises a kind of method for distinguishing speek person for Speaker Identification, to each speaker's classification, only One or a small amount of training sample is provided, and sample has mobility again.The output category model of not direct training pattern but instruct Practice the similarity function of model, a small amount of sample made also can precisely quickly identify speaker.
2, one section of continuous voice signal is decomposed into discrete speech signal vector by the present invention, is known in traditional speaker It is required to the voice signal of input equal length in not, and can be inputted in the present invention with arbitrary voice signal length, Improve the terseness used.
3, the network proposed in the present invention is on the basis of traditional twin network, by the binary channels input expanding of twin network It is inputted to multichannel.More quickly speaker can be identified.
4, the present invention devises a kind of according to similarity between same speaker and different speaker, according to KNN algorithm come Judge the similarity between different speakers and judges whether to belong to same speaker.
Detailed description of the invention
Fig. 1 is a kind of frame of method for distinguishing speek person classified based on twin network model and KNN provided by the invention Flow chart;
Fig. 2 is a kind of voice of the method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention The detail flowchart of feature extraction;
Fig. 3 is a kind of depth of the method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention Recognition with Recurrent Neural Network structure;
Fig. 4 is to construct in a kind of method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention Twin network structure
Fig. 5 is twin in a kind of method for distinguishing speek person based on twin network model and KNN algorithm provided by the invention The detail flowchart of network and KNN algorithm;
Specific embodiment
Technical solution provided by the invention is described further below with reference to attached drawing.
In real life, with personnel increase and leave away, it is desirable to realization speaker is identified according to voice, need Its voice signal to be added when increasing personnel, as new voice addition just needs again to carry out existing model Re -training does not utilize update in this way.The present invention propose based on twin network model make only need by increase newly into voice believe Number be added network, differentiated using the similarity that twin network obtains itself and voice to be measured.
The present invention provides a kind of system based on twin network model and KNN classification Speaker Identification is as shown in Figure 1.It is whole For body, the present invention includes two big steps: step S1: using the voice messaging of microphone acquisition speaker as data set training RNN network model;Step S2: using the twin network model of trained RNN network struction and combine KNN algorithm to speaker It is identified;
Shown in Figure 2, a large amount of voice data collection that will acquire are gone forward side by side line number Data preprocess, by voice signal obtained It carries out preemphasis, framing and carries out Fourier transformation and obtain the speech feature vector v of 40 dimensions;
RNN network model shown in Figure 3, by it is obtained 40 dimension speech feature vector input RNN network model into Row training simultaneously obtains initial model.
Step S141: for be on each its time of voice signal it is associated, the t moment of that section of voice signal Input just contains vt, the voice status s of last moment can be remembered in RNN network modelt-1, the hidden layer h at each momenttAll Related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), herein Activation primitive f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output of one section of voice to Measure F;
Step S144: output vector F={ f is set1,f2…ftBeIf the shared parameter Θ of RNN network model=W, U, V }, according between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss function{ W, U, V } is asked respectively using back-propagation algorithm It leads to obtain optimal parameter Θ and archetype;
With twin network structure shown in Fig. 4, the dual input of twin network is somebody's turn to do as multi input, every time when test N sections of voice signals are inputted, wherein there is one section of voice signal and n-1 section reference voice sample signal to be measured as shown in Figure 5.Circulation mind The feature vector for extracting voice signal to be measured as shown in Figure 3 through network (RNN) is spent respectively with the feature vector of reference speech signal Span is from incorporating that nearest a kind of label of space length into for be measured, speak to realize later by KNN nearest neighbor algorithm People's identification.
Step S221: output vector F will be obtained by the voice signal of twin network of network model0,F1,…,Fn, use The vector classified carries out weight expression, wherein F1={ f11,f12…f1tAnd Fn={ fn1,fn2…fntIndicate speaker's sample The voice signal of concentration, F0={ f01,f02…f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers is not The length for being embodied in two vectors is only related with the angle of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is being found with KNN algorithm Distance F0Nearest point, is expressed as same speaker.

Claims (4)

1. a kind of method for distinguishing speek person based on twin network model and KNN algorithm, which comprises the following steps:
Step S1: use the voice messaging of microphone acquisition speaker as data set training RNN network model;
Step S2: speaker is known using the twin network model of trained RNN network struction and in conjunction with KNN algorithm Not;
Wherein, the step S1 is specific as follows:
Step S11: a large amount of voice data collection of acquisition, line number of going forward side by side Data preprocess;
Step S12: pretreated voice data collection is stored in speech database;
Step S13: obtain voice signal data collection from speech database is made based on the rule that voice signal changes over time The feature vector v of voice signal is obtained with feature extracting method;
Step S14: according to the phonic signal character vector v extracted in step S13, time-based back-propagation algorithm is used BPTT trains RNN model, obtains optimal parameter Θ and archetype;
The step S2 is specific as follows:
Step S21: the trained twin network model of RNN network of network model construction is used, inputs multiple and different languages respectively Sound signal X0…Xn, predict the output vector result set FS={ F of its voice signal0..., Fn};
Step S22: the output vector FS according to obtained in previous step, the COS distance between different output feature vectors is calculated, And KNN algorithm is used, to obtain whether the voice belongs to same people.
2. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S13 is specific as follows:
Step S131: setting the set in a period of time t that x is one section of voice, carries out framing to it, frame length 25s, obtain about The discrete voice signal X=x of time t1, x2...xt
Step S132: input X={ x is set in step S1311, x2...xtMFCC is combined to propose discrete signal progress feature It takes, extracts the speech feature vector V={ v of 40 dimensions1, v2...vt}。
3. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S14 is specific as follows:
Step S141: for being associated, the input of the t moment of that section of voice signal on each its time of voice signal Just contain vtFeature vector, the voice status s of last moment can be remembered in RNN network modelt-1, each moment it is hidden Hide layer htAll related with the input at current time and the state of last moment, formula is as follows:
ht=Uvt+Wst-1
Step S142: for current time t, state stIt is related with the hidden layer at the moment, then st=f (ht), it activates herein Function f selects tanh function, can preferably be fitted voice signal, the hidden layer value at the moment is substituted into obtain:
Step S143: to the output vector f of current time tt, then ft=g (Vst), finally obtain the output vector F of one section of voice;
Step S144: output vector F={ f is set1, f2...ftBeIf shared parameter Θ={ W, U, the V } of RNN network model, According between output valve and true value the calculating entire time and upper difference obtain its loss function
Step S145: by resulting loss functionUsing back-propagation algorithm to { W, U, V } respectively carry out derivation to Obtain optimal parameter Θ and archetype.
4. the method for distinguishing speek person based on twin network model and KNN algorithm as described in claim 1, which is characterized in that
The step S22 is specific as follows:
Step S221: output vector F will be obtained by the voice signal of twin RNN network model0, F1..., Fn, use classification Good vector carries out weight expression, wherein F1={ f11, f12...f1tAnd Fn={ fn1, fn2...fntIndicate speaker's sample set In voice signal, F0={ f01, f02...f0tIndicate speaker's voice signal to be measured;
Step S222: judge whether to belong to same speaker according to COS distance marking, the degree of seeming of two speakers does not embody Only related with the angle of two vectors in the length of two vectors, then its formula is as follows:
Step S223: the COS distance between different phonetic signal is being calculated in step S222, is finding distance with KNN algorithm F0Nearest point, is expressed as same speaker.
CN201910494606.8A 2019-06-06 2019-06-06 Speaker identification method based on twin network model and KNN algorithm Active CN110211594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910494606.8A CN110211594B (en) 2019-06-06 2019-06-06 Speaker identification method based on twin network model and KNN algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910494606.8A CN110211594B (en) 2019-06-06 2019-06-06 Speaker identification method based on twin network model and KNN algorithm

Publications (2)

Publication Number Publication Date
CN110211594A true CN110211594A (en) 2019-09-06
CN110211594B CN110211594B (en) 2021-05-04

Family

ID=67791537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910494606.8A Active CN110211594B (en) 2019-06-06 2019-06-06 Speaker identification method based on twin network model and KNN algorithm

Country Status (1)

Country Link
CN (1) CN110211594B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569908A (en) * 2019-09-10 2019-12-13 苏州思必驰信息科技有限公司 Speaker counting method and system
CN110767239A (en) * 2019-09-20 2020-02-07 平安科技(深圳)有限公司 Voiceprint recognition method, device and equipment based on deep learning
CN111048097A (en) * 2019-12-19 2020-04-21 中国人民解放军空军研究院通信与导航研究所 Twin network voiceprint recognition method based on 3D convolution
CN111126563A (en) * 2019-11-25 2020-05-08 中国科学院计算技术研究所 Twin network-based space-time data target identification method and system
CN111785287A (en) * 2020-07-06 2020-10-16 北京世纪好未来教育科技有限公司 Speaker recognition method, speaker recognition device, electronic equipment and storage medium
CN112270931A (en) * 2020-10-22 2021-01-26 江西师范大学 Method for carrying out deceptive voice detection based on twin convolutional neural network
CN113903043A (en) * 2021-12-11 2022-01-07 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107170445A (en) * 2017-05-10 2017-09-15 重庆大学 The parkinsonism detection means preferably differentiated is cooperateed with based on voice mixing information characteristics
CN108492294A (en) * 2018-03-23 2018-09-04 北京邮电大学 A kind of appraisal procedure and device of image color harmony degree
CN109065032A (en) * 2018-07-16 2018-12-21 杭州电子科技大学 A kind of external corpus audio recognition method based on depth convolutional neural networks
CN109243467A (en) * 2018-11-14 2019-01-18 龙马智声(珠海)科技有限公司 Sound-groove model construction method, method for recognizing sound-groove and system
US20190035431A1 (en) * 2017-07-28 2019-01-31 Adobe Systems Incorporated Apparatus, systems, and methods for integrating digital media content
CN109543009A (en) * 2018-10-17 2019-03-29 龙马智芯(珠海横琴)科技有限公司 Text similarity assessment system and text similarity appraisal procedure

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107170445A (en) * 2017-05-10 2017-09-15 重庆大学 The parkinsonism detection means preferably differentiated is cooperateed with based on voice mixing information characteristics
US20190035431A1 (en) * 2017-07-28 2019-01-31 Adobe Systems Incorporated Apparatus, systems, and methods for integrating digital media content
CN108492294A (en) * 2018-03-23 2018-09-04 北京邮电大学 A kind of appraisal procedure and device of image color harmony degree
CN109065032A (en) * 2018-07-16 2018-12-21 杭州电子科技大学 A kind of external corpus audio recognition method based on depth convolutional neural networks
CN109543009A (en) * 2018-10-17 2019-03-29 龙马智芯(珠海横琴)科技有限公司 Text similarity assessment system and text similarity appraisal procedure
CN109243467A (en) * 2018-11-14 2019-01-18 龙马智声(珠海)科技有限公司 Sound-groove model construction method, method for recognizing sound-groove and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHENG ZHANG: ""Siamese neural network based gait recognition for human identification"", 《ICASSP》 *
YU Q: ""sketch-a-net that beats humans"", 《BRITISH MACHINE VISION CONFERENCE》 *
丁美玉: ""基于时序特征的草图识别方法"", 《计算机科学》 *
马月洁: ""基于深度学习的足球球员跟踪算法研究"", 《中国传媒大学学报自然科学版》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569908A (en) * 2019-09-10 2019-12-13 苏州思必驰信息科技有限公司 Speaker counting method and system
CN110569908B (en) * 2019-09-10 2022-05-13 思必驰科技股份有限公司 Speaker counting method and system
CN110767239A (en) * 2019-09-20 2020-02-07 平安科技(深圳)有限公司 Voiceprint recognition method, device and equipment based on deep learning
CN111126563A (en) * 2019-11-25 2020-05-08 中国科学院计算技术研究所 Twin network-based space-time data target identification method and system
CN111126563B (en) * 2019-11-25 2023-09-29 中国科学院计算技术研究所 Target identification method and system based on space-time data of twin network
CN111048097A (en) * 2019-12-19 2020-04-21 中国人民解放军空军研究院通信与导航研究所 Twin network voiceprint recognition method based on 3D convolution
CN111785287A (en) * 2020-07-06 2020-10-16 北京世纪好未来教育科技有限公司 Speaker recognition method, speaker recognition device, electronic equipment and storage medium
WO2022007766A1 (en) * 2020-07-06 2022-01-13 北京世纪好未来教育科技有限公司 Speaker recognition method and apparatus, electronic device, and storage medium
CN111785287B (en) * 2020-07-06 2022-06-07 北京世纪好未来教育科技有限公司 Speaker recognition method, speaker recognition device, electronic equipment and storage medium
US11676609B2 (en) 2020-07-06 2023-06-13 Beijing Century Tal Education Technology Co. Ltd. Speaker recognition method, electronic device, and storage medium
CN112270931A (en) * 2020-10-22 2021-01-26 江西师范大学 Method for carrying out deceptive voice detection based on twin convolutional neural network
CN113903043A (en) * 2021-12-11 2022-01-07 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN113903043B (en) * 2021-12-11 2022-05-06 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model

Also Published As

Publication number Publication date
CN110211594B (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN110211594A (en) A kind of method for distinguishing speek person based on twin network model and KNN algorithm
CN109409296B (en) Video emotion recognition method integrating facial expression recognition and voice emotion recognition
CN106228977B (en) Multi-mode fusion song emotion recognition method based on deep learning
CN112784798B (en) Multi-modal emotion recognition method based on feature-time attention mechanism
US11862145B2 (en) Deep hierarchical fusion for machine intelligence applications
CN108777140A (en) Phonetics transfer method based on VAE under a kind of training of non-parallel corpus
CN110634491A (en) Series connection feature extraction system and method for general voice task in voice signal
CN108269133A (en) A kind of combination human bioequivalence and the intelligent advertisement push method and terminal of speech recognition
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN103198833B (en) A kind of high precision method for identifying speaker
CN112581979A (en) Speech emotion recognition method based on spectrogram
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN111178157A (en) Chinese lip language identification method from cascade sequence to sequence model based on tone
CN110289002A (en) A kind of speaker clustering method and system end to end
Sarkar et al. Time-contrastive learning based deep bottleneck features for text-dependent speaker verification
CN110211595A (en) A kind of speaker clustering system based on deep learning
CN111128178A (en) Voice recognition method based on facial expression analysis
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN114898779A (en) Multi-mode fused speech emotion recognition method and system
CN103258536B (en) A kind of extensive speaker's identification method
CN110348482A (en) A kind of speech emotion recognition system based on depth model integrated architecture
CN114360584A (en) Phoneme-level-based speech emotion layered recognition method and system
CN116434786A (en) Text-semantic-assisted teacher voice emotion recognition method
Trabelsi et al. A multi level data fusion approach for speaker identification on telephone speech
Shi et al. Construction of english pronunciation judgment and detection model based on deep learning neural networks data stream fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant