CN113077794A - Human voice recognition system - Google Patents
Human voice recognition system Download PDFInfo
- Publication number
- CN113077794A CN113077794A CN202110367218.0A CN202110367218A CN113077794A CN 113077794 A CN113077794 A CN 113077794A CN 202110367218 A CN202110367218 A CN 202110367218A CN 113077794 A CN113077794 A CN 113077794A
- Authority
- CN
- China
- Prior art keywords
- module
- voiceprint
- voice
- recognition
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 26
- 230000009467 reduction Effects 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 16
- 238000001228 spectrum Methods 0.000 claims description 15
- 238000000034 method Methods 0.000 claims description 13
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 4
- 230000001537 neural effect Effects 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 53
- 238000011176 pooling Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a voice recognition system, which is applied to the technical field of voice recognition and comprises the following components: the voice recognition system comprises a voice acquisition module, a preprocessing module, a voiceprint feature extraction module, a function switching module, a model training module and a voice recognition module. The invention can make the input characteristic more perfect, the noise is smaller, the algorithm precision is higher; the deep neural convolution network algorithm is adopted to extract and classify the high-dimensional characteristics of the voice, the voice characteristics of the speaker are extracted and directly identified, the defect that the speaker is identified through the speaking content of the speaker is avoided, and the identification accuracy is improved.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to a human voice recognition system.
Background
The speech recognition technology is an information technology for converting a voice, a byte, or a phrase uttered by a person into a corresponding word or symbol, or giving a response, through a recognition and understanding process of a machine. The voiceprint is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, the generation of human language is a complex physiological and physical process between a human language center and a pronunciation organ, and the size and the shape of the pronunciation organ used by a person during speaking, such as a tongue, teeth, a larynx, a lung and a nasal cavity, are greatly different, so that the voiceprint spectrums of any two persons are different. Corresponding sound wave frequency spectrums of different users have difference when speaking, so that the unique user can be identified through the voiceprint.
In the prior art, the voiceprint recognition mode has the defect of inaccurate recognition, and compared with identity recognition modes such as face recognition, fingerprint recognition and the like, the voiceprint recognition mode is not widely applied at present due to the defect.
Therefore, it is an urgent problem to be solved by those skilled in the art to provide a voice recognition system capable of accurately recognizing human voice.
Disclosure of Invention
In view of this, the present invention provides a voice recognition system, which can accurately recognize the user identity through the voiceprint.
In order to achieve the purpose, the invention adopts the following technical scheme:
a voice recognition system comprises a voice acquisition module, a preprocessing module, a voiceprint feature extraction module, a function switching module, a voiceprint recognition module and a model training module;
the voice acquisition module is connected with the input end of the preprocessing module and is used for acquiring voice print information after acquiring voice;
the preprocessing module is connected with the input end of the voiceprint feature extraction module and is used for carrying out noise reduction processing on the voiceprint information;
the voiceprint feature extraction module is connected with the input end of the function switching module and used for extracting voiceprint features;
the function switching module is used for selecting the voiceprint recognition function and the model training function;
the model training module is connected with the first output end of the function switching module and used for performing model training on the voiceprint features to obtain a voiceprint template;
the voiceprint template library is connected with the output end of the model training template and used for acquiring and storing the voiceprint template;
the input end of the voiceprint recognition module is connected with the second output end of the function switching module, and the first input/output end of the voiceprint recognition module is connected with the input/output end of the voiceprint template library and used for recognizing the identity of the user according to the voiceprint template.
Preferably, the human voice collecting module includes: a sound collection unit and a volume adaptive unit;
the sound collection unit is connected with the input end of the volume self-adaptive unit and is used for collecting user sound for human voice recognition; and the volume self-adaptive unit is used for carrying out self-adaptive processing on the volume of the sound of the user, and carrying out overall normalization processing on the volume of the sound of the user after carrying out recognition model training to the volume of the sound of the user to the same maximum value.
The technical effect realized by the technical scheme is as follows: the sound volume of the user is processed to obtain the same maximum value, so that the intensity of the sound signal is balanced, and the voiceprint feature extraction is convenient.
Preferably, the preprocessing module comprises: a noise reduction unit and a signal enhancement unit;
the noise reduction unit is used for carrying out noise reduction processing on the voiceprint information to obtain the voiceprint information subjected to noise reduction; at least one of a spectrum elimination method and/or a learning identification method and/or a noise reduction automatic encoder is adopted for noise suppression; the signal enhancement unit is connected with the input end of the noise reduction unit and used for enhancing the voiceprint information of the human voice acquisition module.
The technical effect realized by the technical scheme is as follows: the noise of the user voiceprint information is reduced, and the final distinguishing accuracy is improved.
Preferably, the voiceprint feature extraction module includes: the voice print feature extraction unit and the voice spectrum chip conversion unit; the voiceprint feature extraction unit is used for extracting the voiceprint features of the voice of the user through a trained neural network algorithm model; and the voice spectrum picture conversion unit is connected with the output end of the voiceprint feature extraction unit and is used for converting the obtained voiceprint features into a voice spectrum.
Preferably, the method further comprises the following steps: a feedback voice module and a voice output module;
the feedback voice module is connected with the input/output end of the voiceprint recognition module, acquires the recognition result of the voiceprint recognition module and outputs a corresponding voice feedback signal to the voiceprint recognition module; and the sound output module is connected with the third input end of the voiceprint recognition module and used for receiving and outputting the voice feedback signal.
The technical effect realized by the technical scheme is as follows: the result of the human voice (identity) recognition is directly obtained through voice.
Through the technical scheme, compared with the prior art, the invention provides a human voice recognition system which comprises the following components: the invention can make the input characteristic more perfect, the noise is smaller, the algorithm precision is higher; the deep neural convolution network algorithm is adopted to extract and classify the high-dimensional characteristics of the voice, the voice characteristics of the speaker are extracted and directly identified, the defect that the speaker is identified through the speaking content of the speaker is avoided, and the identification accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a block diagram of a human voice recognition system according to the present invention;
FIG. 2 is a block diagram of a human voice acquisition module according to the present invention;
FIG. 3 is a block diagram of a preprocessing module according to the present invention;
FIG. 4 is a block diagram of the voiceprint feature extraction module of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, the present embodiment discloses a human voice recognition system,
the voice recognition system comprises a voice acquisition module, a preprocessing module, a voiceprint feature extraction module, a function switching module, a voiceprint recognition module and a model training module;
the voice acquisition module is connected with the input end of the preprocessing module and is used for acquiring voice print information after acquiring voice;
the preprocessing module is connected with the input end of the voiceprint feature extraction module and is used for carrying out noise reduction processing on the voiceprint information;
the voiceprint feature extraction module is connected with the input end of the function switching module and used for extracting voiceprint features;
the function switching module is used for selecting the voiceprint recognition function and the model training function;
the model training module is connected with the first output end of the function switching module and used for carrying out model training on the voiceprint characteristics to obtain a voiceprint template;
the voiceprint template library is connected with the output end of the model training template and used for acquiring and storing the voiceprint template;
the input end of the voiceprint recognition module is connected with the second output end of the function switching module, and the first input/output end of the voiceprint recognition module is connected with the input/output end of the voiceprint template library and used for recognizing the identity of the user according to the voiceprint template.
In one embodiment, the function switching module includes: acquiring a voiceprint template and recognizing a voiceprint; acquiring voice information with fixed content under the functional state of voiceprint template acquisition, and obtaining a voiceprint template through model training; under the more functional state of voiceprint recognition, the voice sent by the current voice recognizer is collected, and after voice preprocessing and voiceprint feature extraction, the voice is compared with the voiceprint in the voiceprint template library for direct recognition.
In a specific embodiment, the model trained after extracting the voiceprint features in the model training module is any neural network model capable of realizing the technical effect in the prior art.
In one embodiment, the human voice collecting module comprises: a sound collection unit and a volume adaptive unit;
the sound collection unit is connected with the input end of the volume self-adaptive unit and is used for collecting user sound for human voice recognition;
and the volume self-adaption unit is used for carrying out self-adaption processing on the volume of the sound of the user, carrying out recognition model training on the volume of the sound of the user, and carrying out overall normalization processing to the same maximum value.
In one embodiment, the adaptive processing of the volume includes voiced regions and unvoiced regions, for voiced regions
1): solving the maximum value of the current sound data;
2): calculating a coefficient value with respect to a constant with reference to the maximum value of the sound volume;
3): the current sound data is multiplied by the coefficient to obtain the data of the adaptive volume.
Specifically, the adaptive processing of the silence area adopts a silence area self-truncation algorithm of the input sound characteristics, and the silence area self-truncation algorithm of the input sound characteristics comprises the following steps:
1) averaging the absolute values of about 0.1s of 1600 sample values;
2) judging whether the current data is a silent area or not through a threshold value;
3) the truncation processing is performed for data that is a silent zone.
In one particular embodiment, the pre-processing module includes: a noise reduction unit and a signal enhancement unit;
the noise reduction unit is used for carrying out noise reduction processing on the voiceprint information to obtain the voiceprint information subjected to noise reduction; at least one of a spectrum elimination method and/or a learning identification method and/or a noise reduction automatic encoder is adopted for noise suppression;
the signal enhancement unit is connected with the input end of the noise reduction unit and used for enhancing the voiceprint information of the human voice acquisition module.
In a specific embodiment, the voiceprint feature extraction module comprises: the voice print feature extraction unit and the voice spectrum chip conversion unit; the voiceprint feature extraction unit is used for extracting the voiceprint features of the voice of the user through the trained neural network algorithm model; and the voice spectrogram slice conversion unit is connected with the output end of the voiceprint feature extraction unit and is used for converting the obtained voiceprint features into a voice spectrogram.
In a specific embodiment, a convolutional neural network is combined with an attention mechanism to perform feature extraction on user voice to obtain a frame-level feature vector sequence; the method comprises the steps of performing down-sampling on a frame-level feature vector sequence by combining an attention mechanism to convert the frame-level feature vector sequence into an intermediate feature vector with a preset dimension; and performing full-concatenation operation on the intermediate feature vector to obtain a sentence-level voiceprint feature vector.
In a specific embodiment, the extracting features of the input target speech data by using a convolutional neural network in combination with an attention mechanism to obtain a frame-level feature vector sequence includes:
sequentially performing convolution and rectification operation for at least one time on the target voice data to obtain a first feature vector sequence; calculating a channel attention vector according to the first feature vector sequence, and weighting the first feature vector sequence by using the channel attention vector sequence to obtain a second feature vector sequence; calculating a time attention vector according to the second feature vector sequence, and weighting the second feature vector sequence by using the time attention vector sequence to obtain a third feature vector sequence; and rectifying the third feature vector sequence to obtain a frame-level feature vector sequence.
In one embodiment, the calculating the channel attention vector by the first feature vector sequence, and the weighting the first feature vector sequence by the channel attention vector to obtain the second feature vector sequence includes: aggregating time information of each channel of the first sequence of feature vectors using an average pooling operation and a maximum pooling operation, respectively; respectively inputting the results of the average pooling operation and the maximum pooling operation into a multilayer perceptron; calculating a channel attention vector according to a result output by the multilayer perceptron by using a Sigmoid function; and multiplying the first feature vector sequence and the channel attention vector element by element to obtain the second feature vector sequence.
In a specific embodiment, the calculating the time attention vector by the second feature vector sequence, and weighting the second feature vector sequence by using the time attention vector to obtain the third feature vector sequence includes: aggregating channel information of each time point of the second feature vector sequence by utilizing average pooling operation and maximum pooling operation respectively; merging results of the average pooling operation and the maximum pooling operation into a multi-dimensional vector; carrying out convolution processing on the multidimensional vector by adopting a preset convolution kernel to obtain a time attention vector; and multiplying the second feature vector sequence and the time attention vector element by element to obtain a third feature vector sequence.
In a specific embodiment, after the calculating a time attention vector by the second feature vector sequence and weighting the second feature vector sequence by using the time attention vector to obtain a third feature vector sequence, the method further includes: adding a residual error on the basis of the third feature vector sequence; performing rectification operation on the third feature vector sequence to obtain a frame-level feature vector sequence comprises: and performing rectification operation on the sum of the third feature vector sequence and the residual error to obtain a frame-level feature vector sequence.
In a specific embodiment, the neural network algorithm model comprises an input layer, an SVM layer, a convolution layer, a pooling layer and a full-connection layer, wherein the input layer is spectrum information obtained by Laplace transform of voiceprint information output by the preprocessing module, the spectrum information of the full-connection layer input by the SVM layer is a feature vector obtained by a voiceprint feature extraction module, and the convolution layer adopts a convolution kernel of 5 × 5 and 8 filters; the size of the pooling window of the pooling layer is 3 x 3, and the number of channels is 16; the full connection layer adopts 16 filters and a convolution kernel of 3 x 3; the input of the full connection layer is from the output of the pooling layer;
the pooling method of the pooling layer is as follows:
Xe=f(ue+φ(ue))
wherein x iseRepresents the output of the current layer, ueRepresenting the input of an activation function, f () representing an activation function, weRepresents the weight of the current layer, phi represents the loss function, xe-1Represents the output of the next layer, beRepresents the offset and δ represents a constant.
In a specific embodiment, the method further comprises the following steps: a feedback voice module and a voice output module;
the voice feedback module is connected with the input/output end of the voiceprint recognition module, acquires the recognition result of the voiceprint recognition module and outputs a corresponding voice feedback signal to the voiceprint recognition module;
and the sound output module is connected with the third input end of the voiceprint recognition module and used for receiving and outputting a voice feedback signal, such as: successful recognition, name Zhang III, study number 001, examination subject being math, etc.
In one embodiment, the user is a student taking a test, and voice recognition is performed to identify identity information of the student, so that cheating of a test in replacement is prevented.
In a particular embodiment, the sound collection unit comprises a microphone.
In a particular embodiment, the sound output module comprises a speaker.
In one particular embodiment, the noise reduction unit includes an XFM10412 chip.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention in a progressive manner. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (5)
1. A human voice recognition system, comprising: the voice recognition system comprises a voice acquisition module, a preprocessing module, a voiceprint feature extraction module, a function switching module, a voiceprint recognition module and a model training module;
the voice acquisition module is connected with the input end of the preprocessing module and is used for acquiring voice print information after acquiring voice;
the preprocessing module is connected with the input end of the voiceprint feature extraction module and is used for carrying out noise reduction processing on the voiceprint information;
the voiceprint feature extraction module is connected with the input end of the function switching module and used for extracting voiceprint features;
the function switching module is used for selecting the voiceprint recognition function and the model training function;
the model training module is connected with the first output end of the function switching module and used for performing model training on the voiceprint features to obtain a voiceprint template;
the voiceprint template library is connected with the output end of the model training template and used for acquiring and storing the voiceprint template;
and the input end of the voiceprint recognition module is connected with the second output end of the function switching module, and the first input/output end of the voiceprint recognition module is connected with the input/output end of the voiceprint template library and is used for recognizing the identity of the user according to the voiceprint template.
2. A human voice recognition system according to claim 1, wherein:
the human voice acquisition module comprises: a sound collection unit and a volume adaptive unit;
the sound collection unit is connected with the input end of the volume self-adaptive unit and is used for collecting user sound for human voice recognition;
and the volume self-adaptive unit is used for carrying out self-adaptive processing on the volume of the sound of the user, and carrying out overall normalization processing on the volume of the sound of the user after carrying out recognition model training to the volume of the sound of the user to the same maximum value.
3. A human voice recognition system according to claim 1, wherein:
the preprocessing module comprises: a noise reduction unit and a signal enhancement unit;
the noise reduction unit is used for carrying out noise reduction processing on the voiceprint information to obtain the voiceprint information subjected to noise reduction; performing noise suppression by at least one of a spectrum elimination method and/or a learning similarity method and/or a noise reduction automatic encoder;
the signal enhancement unit is connected with the input end of the noise reduction unit and used for enhancing the voiceprint information of the human voice acquisition module.
4. A human voice recognition system according to claim 1, wherein:
the voiceprint feature extraction module comprises: the voice print feature extraction unit and the voice spectrum chip conversion unit;
the voiceprint feature extraction unit is used for extracting the voiceprint features of the voice of the user through a trained neural network algorithm model;
and the voice spectrum picture conversion unit is connected with the output end of the voiceprint feature extraction unit and is used for converting the obtained voiceprint features into a voice spectrum.
5. A human voice recognition system according to any one of claims 1 to 4, wherein:
further comprising: a feedback voice module and a voice output module;
the feedback voice module is connected with the input/output end of the voiceprint recognition module, acquires the recognition result of the voiceprint recognition module and outputs a corresponding voice feedback signal to the voiceprint recognition module;
and the sound output module is connected with the third input end of the voiceprint recognition module and used for receiving and outputting the voice feedback signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110367218.0A CN113077794A (en) | 2021-04-06 | 2021-04-06 | Human voice recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110367218.0A CN113077794A (en) | 2021-04-06 | 2021-04-06 | Human voice recognition system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113077794A true CN113077794A (en) | 2021-07-06 |
Family
ID=76615844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110367218.0A Withdrawn CN113077794A (en) | 2021-04-06 | 2021-04-06 | Human voice recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113077794A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113948109A (en) * | 2021-10-14 | 2022-01-18 | 广州蓝仕威克软件开发有限公司 | System for recognizing physiological phenomenon based on voice |
-
2021
- 2021-04-06 CN CN202110367218.0A patent/CN113077794A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113948109A (en) * | 2021-10-14 | 2022-01-18 | 广州蓝仕威克软件开发有限公司 | System for recognizing physiological phenomenon based on voice |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tiwari | MFCC and its applications in speaker recognition | |
US20130297299A1 (en) | Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition | |
Rajisha et al. | Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM | |
KR101785500B1 (en) | A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing | |
Murugappan et al. | DWT and MFCC based human emotional speech classification using LDA | |
US20230298616A1 (en) | System and Method For Identifying Sentiment (Emotions) In A Speech Audio Input with Haptic Output | |
Usman et al. | Heart rate detection and classification from speech spectral features using machine learning | |
Kanabur et al. | An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition | |
Grewal et al. | Isolated word recognition system for English language | |
CN113077794A (en) | Human voice recognition system | |
JP2015175859A (en) | Pattern recognition device, pattern recognition method, and pattern recognition program | |
Karthikeyan et al. | Hybrid machine learning classification scheme for speaker identification | |
Singh et al. | Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition | |
Kamble et al. | Emotion recognition for instantaneous Marathi spoken words | |
Tripathi et al. | CNN based Parkinson's Disease Assessment using Empirical Mode Decomposition. | |
Hemmerling et al. | Parkinson’s disease classification based on vowel sound | |
Sengupta et al. | Optimization of cepstral features for robust lung sound classification | |
CN115050353A (en) | Human voice recognition system | |
Abushariah et al. | Voice based automatic person identification system using vector quantization | |
Nazifa et al. | Gender prediction by speech analysis | |
CN114881668A (en) | Multi-mode-based deception detection method | |
Daqrouq et al. | Arabic vowels recognition based on wavelet average framing linear prediction coding and neural network | |
Sas et al. | Gender recognition using neural networks and ASR techniques | |
Arpitha et al. | Diagnosis of disordered speech using automatic speech recognition | |
CN111508503B (en) | Method and device for identifying same speaker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210706 |
|
WW01 | Invention patent application withdrawn after publication |