CN114999461A - Silent voice decoding method based on facial neck surface myoelectricity - Google Patents
Silent voice decoding method based on facial neck surface myoelectricity Download PDFInfo
- Publication number
- CN114999461A CN114999461A CN202210598661.3A CN202210598661A CN114999461A CN 114999461 A CN114999461 A CN 114999461A CN 202210598661 A CN202210598661 A CN 202210598661A CN 114999461 A CN114999461 A CN 114999461A
- Authority
- CN
- China
- Prior art keywords
- syllable
- batch
- phrase
- data
- signal window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000001815 facial effect Effects 0.000 title claims description 12
- 238000012545 processing Methods 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims abstract description 9
- 230000003183 myoelectrical effect Effects 0.000 claims abstract description 7
- 230000002457 bidirectional effect Effects 0.000 claims description 36
- 239000011159 matrix material Substances 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 21
- 238000010586 diagram Methods 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 14
- 238000013145 classification model Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 210000001097 facial muscle Anatomy 0.000 claims description 4
- 210000004237 neck muscle Anatomy 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 239000011541 reaction mixture Substances 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 3
- 210000003205 muscle Anatomy 0.000 abstract description 11
- 230000000694 effects Effects 0.000 abstract description 8
- 238000002567 electromyography Methods 0.000 abstract description 7
- 238000000605 extraction Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 19
- 125000004122 cyclic group Chemical group 0.000 description 7
- 238000003491 array Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 108010003272 Hyaluronate lyase Proteins 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000284 resting effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 229920000742 Cotton Polymers 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 241001522296 Erithacus rubecula Species 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 102000011782 Keratins Human genes 0.000 description 1
- 108010076876 Keratins Proteins 0.000 description 1
- 101100001347 Mus musculus Akt1s1 gene Proteins 0.000 description 1
- 206010049565 Muscle fatigue Diseases 0.000 description 1
- 241000223503 Platysma Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 210000003040 circulating cell Anatomy 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000001352 masseter muscle Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009747 swallowing Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/027—Syllables being the recognition units
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
Abstract
The invention discloses a silent voice decoding method based on the surface electromyography of the face and neck, which decodes the voice content without sounding by processing the collected surface electromyography signals corresponding to the relevant muscle activities in the process of reading by the user, and comprises the following steps: 1. collecting surface electromyographic signals of a user to form a training data set; 2. carrying out data segmentation to obtain a training data set with syllable labels; 3. carrying out data enhancement; 4. carrying out feature extraction on the training data set after data enhancement; 5. constructing a deep neural network for depicting space-time information; 6. and constructing a statistical language model to obtain the prediction of the continuous reading phrase of the user. The invention recognizes the voice content from the finer-grained structure forming the voice sequence, not only can realize high-performance silent voice recognition, but also can be helpful for understanding the meaning of the voice corresponding to the surface myoelectric activity, and provides a new thought for the silent voice recognition method.
Description
Technical Field
The invention belongs to the field of biological signal processing, machine learning and intelligent control, and particularly relates to a silent voice decoding method based on facial neck surface myoelectricity.
Background
Voice is an essential effective and convenient communication mode in human daily life. In the past decades, speech-related human-computer interaction technology, represented by Automatic Speech Recognition (ASR) technology, has developed rapidly and has shown very high performance in general scenarios. However, the disadvantages of ASR are very apparent due to the dependence on voiced speech. If can't guarantee effective work under high noise background, can't satisfy privacy interactive demand to the dysvocation crowd can't rely on ASR to carry out daily interchange.
To overcome the above disadvantages, researchers have explored non-acoustic speech recognition methods. During human speech and reading, the voice-related muscle groups of the face and neck are activated, producing bioelectric signals called surface electromyograms (sEMG). Therefore, Silent Speech Recognition (SSR) based on sEMG has become an important supplementary approach to ASR in some special scenarios. SSR techniques based on sEMG have made some progress over decades. Early SSRs mainly used classical mode classification methods such as support vector machines, conjugate gradient networks, etc.; and recording the sEMG of the face and the neck of the subject by using discrete electrodes with a small number of channels, and identifying the corpus with limited word number. Later research has tended to identify lexical corpora using Hidden Markov Models (HMMs) that characterize sEMG timing information. With the development of data acquisition technology, high-density (HD) electrode arrays are designed to simultaneously record a large number of channel surface myoelectrical signals of a target muscle or group of muscles over a relatively large area. The use of high density surface electromyographic signal (HD-sEMG) arrays helps to capture valuable spatial information, characterize heterogeneity of muscle activity, and thereby improve performance of electromyographic pattern recognition.
While the above studies demonstrate the availability of pattern classification techniques in achieving satisfactory SSR performance, there are still some deficiencies. For example, 1) depending on the pattern classification method, a phrase or word is simply mapped between sEMG pattern features, and semantic information of time-series association is omitted. 2) The performance of classification techniques is limited by the number of vocabularies in the corpus. 3) The common mode classification technology is mainly used for identifying isolated words and cannot realize natural and coherent silent voice interaction.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a silent speech decoding method based on facial neck surface electromyography, so that a finer-grained structure of a speech sequence can be identified and speech content can be understood, the identification performance of phrases with similar pronunciations is improved, and accurate and natural silent speech interaction can be realized finally.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a silent voice decoding method based on facial neck surface myoelectricity, which is characterized by comprising the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing the nth Chinese phrase in the instruction set P, wherein the N Chinese phrases contain L-class syllables;
the method comprises the steps of collecting surface electromyographic signals generated by facial muscles and neck muscles when a user reads Chinese phrases silently by using a high-density electrode array, and labeling a rest signal segment and a surface electromyographic signal segment corresponding to the phrase in the surface electromyographic signals by using a double-threshold detection method based on short-time energy and zero crossing rate so as to form each phrase signal segment with labels and form a training phrase data set S p ;
Step two, using a series of time weights before and afterSegmenting the training phrase data set S by a stack of signal windows p Obtaining M signal window samples, uniformly dividing each phrase signal segment according to the number of syllables contained in the phrase signal segment, and marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment so as to obtain a batch of training data sets consisting of the M signal window samples with syllable marks;
step three, changing the segmentation time of the signal window to adjust the segmentation window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of training data sets with syllable labelsWherein,represents the k-th batch of training data set with syllable labels, an The mth signal window sample representing the kth batch of data,representing corresponding syllable labels, and adopting one-hot coding to represent,has a size of [1, L ]];S origin Contains a total of M × K signal window samples;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data;
step 4.2, according to the relative position of the signal channels of the high-density electrode array, converting the surface electromyographic signals collected by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array, wherein the size of the matrix is marked as [ e, g ];
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas set of all the signal window samplesA d-frame three-dimensional electromyogram representing the m-th signal window sample of the kth batch of data,the size of (a) is given as [ d, e, g, c ]],Syllable labels of the mth signal window sample representing the kth batch of data;
step five, constructing a deep neural network based on the depicting space-time information, which comprises the following steps: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating cyclic unit blocks and A full connection layers, and three-dimensional electromyogram characteristic map set S input Inputting the deep neural network according to K batches;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into the a-th expansion volume block for processing, and outputting the k-th a feature map asRepresenting the mth signal window sample in the kth batch of three-dimensional electromyogram feature setOutput feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batch is addedInputting the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batchSo that the A-th expanded volume block outputs the final feature map
Step 5.2, the characteristic diagramAfter the flattening layer is processed, obtaining a flattening feature set of the kth batch Wherein,representing the m characteristic diagram of the k batchThe size of the characteristic diagram output after passing through the flattening layer is [ d, e multiplied by g multiplied by H ] a ];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
when a is 1, the flattening feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating characteristic set of the k-th batchRepresentation characteristic diagramThe gating characteristics output after the processing of the a-th bidirectional gating circulating unit block,size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batchSo that the A-1 gating feature set of the kth batch is output by the A-1 bidirectional gating cycle unit blockSize of [ d,2 x b ]];
When a is A, the a-1 gating feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batchSize of [1,2 x b ]];
Step 5.4, enabling the activation functions of the first A-1 full connection layers to adopt Tanh and respectively connect one Dropout layer, wherein the activation function of the A-th full connection layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit blockIn turn pass throughAfter A full connection layers are processed, a scoring matrix of a syllable decision sequence is outputWherein,m signal window samples representing the kth batch of dataAre predicted as probabilities of L syllables, respectively, andwherein,m sample representing kth batch of dataProbability of being predicted as a class j syllable;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
in the formula (1), the reaction mixture is,sample of mth signal window for kth batch of dataCorresponding syllable labelsThe value of the j-th position;
step 5.6, training a neural network:
updating the weight parameters of the deep neural network by adopting an Adam optimizer, setting the maximum iteration time step and dynamically changing the network learning rate lr, and stopping training when the Loss function Loss reaches the minimum or the iteration time is equal to the step, so as to obtain an optimal syllable classification model;
step six, constructing a statistical language model according to the instruction set P of the Chinese phrases, and then carrying out post-processing on the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase;
step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional electromyography characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into an optimal syllable classification model, and outputting a scoring matrix of a syllable label sequence of the Chinese phrase pWherein,a score probability matrix representing the U-th syllable of the Chinese phrase p', U representing the length of the syllable sequence;
step 6.4, the search depth of each syllable is set as depth, and a multi-cluster search algorithm is utilized to carry out the search on all syllablesIs processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, if so, phrases corresponding to the syllable label sequence which is matched with the syllable label sequence and has the highest score are selected from the syllable label sequencesAnd output, otherwise, execute6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase pThe syllable with the highest score probability is recorded asThereby obtaining syllable decision sequenceSelecting a syllable decision sequence in a many-to-one mapping relation mapping thetaPhrase with minimum edit distanceAs a result of the decoding of the chinese phrase p'.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention divides data based on the original phrase sEMG data to obtain fine-grained syllable surface electromyographic data, establishes a neural network (DC-BiGRU) of an expanded convolution bidirectional gating cyclic unit as a classifier, further provides a statistical language model to depict semantic information, refines and corrects the output of the trained classifier, obtains accurate prediction of phrase sequences, and realizes accurate and natural silent speech recognition through a decoding framework.
2. The invention is beneficial to simplifying the preparation process of the training data through the automatic marking method of the training data; meanwhile, the invention provides a data enhancement method based on adjustment of the windowing boundary, which effectively relieves the over-fitting phenomenon of a deep network and improves the performance of silent speech recognition.
3. The invention starts research at fine granularity level, provides a statistical language model based on multi-cluster searching and editing distance, and utilizes semantic time sequence correlation information of phrase sets to help to understand the meanings of phrases and improve the recognition performance of similar pronunciation phrases.
4. The invention realizes high-performance natural continuous silent speech recognition according to the data processing requirement of a real-time system, thereby being beneficial to the practical application of the method in the fields of myoelectricity control and the like.
Drawings
FIG. 1 is a flow chart of a method for decoding a silent voice based on facial neck surface electromyography according to the present invention;
FIG. 2 is a set of Chinese pronunciation phrases according to the present invention;
FIG. 3 is an illustration of the shape parameters and placement position of a face and neck high density electrode array used in the present invention;
FIG. 4 is a schematic diagram of a data segmentation, data automatic labeling and data enhancement method adopted by the present invention;
FIG. 5 is a schematic illustration of the spatial position distribution and stitching results in a high density electrode array according to the present invention;
FIG. 6 is a schematic diagram of the structure of a classification network based on the extended convolution bi-directional gated cyclic unit (DC-BiGRU) used in the present invention;
FIG. 7 is a graph of the average phrase recognition rate and the standard deviation score obtained by the present invention;
FIG. 8a is a schematic diagram of a confusion matrix based on DC-BiGRU phrase classification obtained by the present invention;
fig. 8b is a schematic diagram of the confusion matrix obtained by the present invention based on the proposed DCBiMEP decoding method.
Detailed Description
In this embodiment, a method for decoding a unvoiced sound based on facial neck surface electromyography extracts time-series related semantic information using a statistical language model, which not only improves recognition performance of phrases with similar pronunciation, but also helps to understand meaning of phrases corresponding to sEMG activity, and provides a new idea for the unvoiced sound recognition method, and specifically, as shown in fig. 1, includes the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing in instruction set PThe nth Chinese phrase, wherein the N Chinese phrases contain L-class syllables; as shown in fig. 2, the chinese pronunciation vocabulary consists of N-30 phrases, including 79 chinese syllables and 1 resting syllables, L-80;
the method comprises the steps of collecting surface electromyographic signals generated by facial and neck muscles when a user reads Chinese phrases acquiescently by using a high-density electrode array, and labeling a rest signal segment in the surface electromyographic signals and a surface electromyographic signal segment corresponding to the phrases by using a double-threshold detection method based on short-time energy and zero crossing rate, so that each phrase signal segment with labels is formed and a training phrase data set Sp is formed. In this example, a total of 8 healthy subjects aged 21-26 years, having no hearing or speech impairment, were collected from 7 males and one female for the data collection experiment. Each subject was specifically informed for each experimental procedure and specific requirements.
The high density electrode array shape parameters and placement position are shown in fig. 3. The total number of the high-density flexible electrode arrays is four, and two symmetrical arrays are respectively arranged on the left side and the right side of the face and the neck. Illustratively, the number of channels of the two facial electrode arrays is 16, the diameter of the electrodes is 5mm, and the electrode spacing ranges from 10 mm to 15 mm to 18 mm; the number of channels of the two neck electrode arrays is also 16, the diameter of the electrodes is 5mm, and the distance between the electrodes is 18 mm. The array of face-neck electrodes collectively comprises a 64-channel array. In addition, one electrode is respectively attached to the back of the left and right ears and is used as a reference electrode and a ground electrode;
prior to application of the electrode array, the subject's facial and neck target muscles were scrubbed with an alcohol cotton pad to clean the skin keratin while a suitable amount of conductive gel was applied to the electrode probes to reduce the skin impedance. Illustratively, the facial electrode array is used for collecting sEMG of facial muscles such as zygomatic muscles, masseter muscles and inferior labial muscles, and the neck electrode array is used for collecting sEMG of neck muscles such as scapula-hyoid muscles, sternohyoid muscles and platysma muscles. During the collection process, the subjects silently express each phrase at a uniform speed with medium strength, each phrase is repeated 20 times as a test, the interval time of each repetition of the phrase is t, and the t is set to be 3s exemplarily. In each test, activities unrelated to the collection task, such as swallowing saliva and coughing, were not allowed. To avoid muscle fatigue in the subject, there was a rest period of T between trials, illustratively, T was taken to be 30 s;
the sEMG activity detection is performed on the raw data, and the result is shown in a part a in fig. 4. Firstly, calculating the short-time energy and zero crossing rate of a baseline in a resting state as initial energy and initial zero crossing rate, and recording as E _ i and C _ i; the short-time sample is used for calculating the short-time energy and the zero crossing rate of the original data, and the length of the short-time sample is S _ length. The method needs to set three thresholds, wherein the first two thresholds are high and low thresholds set by short-term energy values and are respectively marked as E _ h and E _ l, and are used for carrying out initial judgment on an initial position and an offset position; the third is a threshold SC of short-term zero crossing rate, illustratively, S _ length is set to 64ms, E _ h is set to 8 × E _ i, E _ l is set to 3 × E _ i, SC is set to 3 × C _ i; obtaining a rest signal section and a mark of a signal section corresponding to each phrase;
step two, segmenting the training phrase data set S by a series of signal windows with time overlapping before and after p Obtaining M signal window samples; as shown in part b of fig. 4, the sliding window length for data partitioning is W _ length, the Overlap ratio is overlay, and illustratively, W _ length is set to 1000ms and overlay is set to 50%; dividing all the phrase signal segments according to the number of syllables contained in the phrase signal segments, and then marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment, wherein the result is shown as part b in fig. 4; according to the time span of the signal window sample under different syllable labels, labeling the signal window with corresponding syllable labels to obtain a training data set with syllable labels;
step three, changing the segmentation time of the signal windows to adjust the window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of trainings with syllable labelsExercise data setWherein,represents the k-th batch of training data set with syllable labels, an The mth signal window sample representing the kth batch of data,representing corresponding syllable labels, and adopting one-hot coding to represent, and its size is [1, L];S origin Contains a total of M × K signal window samples; in this embodiment, as shown in part c in fig. 4, the initial position of each data division is shifted backward by Δ/5 on the basis of the previous batch, and K batches of signal window samples with labels are obtained by the above syllable labeling method, where M is 327, K is 5, and Δ is set to 500 ms;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data; in this embodiment, the frame length of consecutive non-overlapping frames is F _ length ═ 40 ms;
step 4.2, as shown in fig. 5, converting the surface electromyographic signals acquired by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array according to the relative position of the signal channel of the high-density electrode array, wherein the size of the surface electromyographic data matrix is marked as [ e, g ]; in the present embodiment, e ═ 8 and g ═ 8 are set;
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas of all the signal window samplesA d-frame three-dimensional electromyogram representing the mth signal window sample of the kth batch of data,the size of (a) is given as [ d, e, g, c ]],Syllable labels representing mth signal window samples of the kth data; in this embodiment, c is 4, the 4 extracted myoelectric time-domain features are Mean Absolute Value (MAV), Waveform Length (WL), zero cross point (ZC), and slope sign number (SSC), the number of frames of the feature map of each signal window is d is 25, and the feature map size is [25,8,8,4 ═ 25]Finally, a database S formed by the characteristic diagrams of all the signal window samples is obtained input As input to the neural network.
Step five, constructing a deep neural network based on the depicting space-time information, comprising the following steps of: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating circulation unit blocks and A full connection layer, and collecting three-dimensional electromyogram feature map S input Inputting the deep neural network according to K batches; as shown in fig. 6, the deep neural network characterizing spatio-temporal information is composed of a number of expanded convolution blocks including a time distribution layer, a flattening layer, a number of bidirectional gated cyclic unit blocks, and a full connection layer; in the present embodiment, a ═ 2;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into an a-th expansion volume block for processing, and outputting the k-th a feature map of the k-th batchRepresenting the mth signal window sample in the kth batch of three-dimensional electromyogram feature setOutput feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batchInputting the data into the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batchSo that the A-th expanded volume block outputs the final feature mapIn this embodiment, the first expanded convolution layer is composed of H 1 32 filters of 3 × 3 with a spreading factor of 1, the second layer of extended convolutional layers consisting of H 2 With a spreading factor of 3, two Dropout layer ratios of 0.5 for 8 filters of 3 × 3;size of [25,8,8,32 ]],Size of [25,8,8,8 ]];
Step 5.2, feature mapAfter the treatment of the flattening layer, the flattening feature set of the kth batch is obtained Wherein,representing the m characteristic diagram of the k batchThe feature diagram output after the flattening is in the size of [ d, e × g × H a ](ii) a In the present embodiment, the first and second electrodes are,size of [25,512];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
set of flattening features for lot k when a is 1Inputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batchRepresentation characteristic diagramThe gating characteristics output after the processing of the a-th bidirectional gating circulating cell block,size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batchInputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batchThereby the A-1 st bidirectional gating circulation listMetablock outputs the A-1 gating feature set of the kth batchSize of [ d,2 x b ]];
When a is A, the kth batch of a-1 gating feature setInputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batchSize of [1,2 x b ]](ii) a In this embodiment, each bidirectional gated cyclic unit block includes 1 bidirectional gated cyclic unit layer using a ReLU activation function and 1 Dropout layer, the hidden node dimensions of the two layers of bidirectional gated cyclic units are both b ═ 64, and the Dropout ratio is 0.4;size of [25,128],Size of [1,128 ]];
Step 5.4, the activation functions of the first A-1 fully-connected layers adopt Tanh and are respectively connected with one Dropout layer, and the activation function of the A-th fully-connected layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit blockAfter being processed by A full connection layers in sequence, the scoring matrix of the syllable decision sequence is outputWherein,m signal window samples representing the kth batch of dataAre predicted as probabilities of L syllables, respectively, andwhereinM-th sample representing the kth batch of data from the networkThe probability of predicting a class j syllable; in this embodiment, Tanh is adopted as an activation function of the 1 st fully-connected layer, the dimension of the hidden node layer is 200, 1 Dropout layer with a ratio of 0.2 is connected, and the dimension of the hidden node of the 2 nd fully-connected layer is 80;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
in the formula (1), the reaction mixture is,sample of mth signal window for kth batch of dataCorresponding syllable labelsThe value of the j-th position; in this embodiment, the one-hot coding length is 80, only one position has a value of 1, the rest are 0, each batch contains M samples, and the loss function is obtained by cross entropy weighted summation of K batches of samples;
step 5.6, training a neural network:
updating weight parameters of a deep neural network by adopting an Adam optimizer, setting maximum iteration times step and dynamically changing a network learning rate lr, and stopping training when a Loss function Loss reaches the minimum or the iteration times is equal to step so as to obtain an optimal syllable classification model; in this embodiment, step is 300, the initial learning rate lr is 0.01, and the learning rate lr becomes 0.1 × lr every 100 iterations.
Step six, constructing a statistical language model according to the instruction set P of the Chinese phrases, and then carrying out post-processing on the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase; in this embodiment, the speech rates of different subjects are different, and it is difficult to ensure that the speech rates of the same phrase are the same when the same phrase is read by default repeatedly, resulting in different numbers of signal window samples of the same phrase, and therefore, the syllable label sequence to phrase is a many-to-one mapping.
Step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional electromyography characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into the optimal syllable classification model, and outputting a scoring matrix of the syllable label sequence of the Chinese phrase pWherein,a score probability matrix representing the U-th syllable of the Chinese phrase p', U representing the length of the syllable sequence;
step 6.4, the search depth of each syllable is set to depth, and a multi-cluster search algorithm is utilized to carry out the depth matchingIs processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, and if so, the syllable label sequence is matched with the many-to-one mapping relation thetaSuccessfully, the phrase corresponding to the syllable label sequence which is matched and has the highest score is selected from the syllable label sequencesAnd outputting, otherwise, executing step 6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase pThe syllable with the highest score probability is recorded asThereby obtaining syllable decision sequenceSelecting a sequence of syllable labels in a many-to-one mapping relation mapping thetaPhrase with minimum edit distanceAs a result of the decoding of the chinese phrase p'.
In this embodiment, depth is set to 5, and the phrase corresponding to the syllable tag sequence with the highest score is selected from the tag sequences successfully matched with the many-to-one mapping relation θ by the formula (2)
In the formula (2), the reaction mixture is,phrase indicating matching successCorresponding score, max {. to } returns the phrase corresponding to the maximum scorePhraseRepresents a phrase in the phrase instruction set P. Obtaining syllable label sequence from formula (3)In the formula (3), argmax {. cndot.) returns the syllable with the highest score for each syllable score matrix.
In this embodiment, to quantitatively evaluate the effect of the present invention, the decoding method of the present invention is compared with the conventional classification method, and is denoted as DCBiMEP. In the comparison experiment, four common phrase classification methods are adopted to be compared with the DCBiMEP of the invention. The four classification methods are respectively labeled as HMM, extended convolutional neural network (DCNN), bidirectional gated round robin unit (BiGRU), and DC-BiGRU. The data preparation process of the four methods is as follows: and carrying out sEMG activity detection on the original electromyographic data, extracting sEMG activity data corresponding to each phrase, marking the corresponding phrase tag, and carrying out feature extraction on the tagged phrase data to obtain feature data of all phrases. In addition, in order to verify the effectiveness of data enhancement on the method, the method deduces two methods according to whether the data enhancement is carried out, and the two methods are respectively marked as DCBiMEP and AUG-DCBiMEP which represent the method after the data enhancement is carried out. Fig. 7 shows the results of Phrase Recognition Accuracy (PRA) of the above 6 methods on the data of 8 subjects, and the PRAs of the 4 conventional phrase classification methods were (82.74 ± 7.48)%, (83.06 ± 7.31)%, (87.92 ± 5.82)% and (90.49 ± 5.47)%, respectively, and it can be seen that the DC-BiGRU performance characterizing the spatio-temporal information is the best. The PRA of DCBiMEP of the method is (97.27 +/-1.44)%, and the performance is obviously superior to that of 4 comparison phrase classification methods. The PRA of AUG-DCBiMEP is improved by 0.91 percent on the basis of the method of the invention to reach (98.18 +/-1.44)%, and the effectiveness of the data enhancement on the method of the invention is proved.
Fig. 8a and 8b show the phrase recognition confusion matrix on subject 2 data for the DC-BiGRU and the method of the present invention that performed best in the 4-class comparison method. It is evident that DC-BiGRU does not perform as well as the inventive method for the recognition of similar phrases in pronunciation, such as "slow down" and "speed up" and "turn left" and "turn right".
In combination with the above comparative experiments and recognition results, the following conclusions can be drawn, including: 1) the decoding method provided by the invention can efficiently identify phrases with similar pronunciation, and improve the performance of a silent voice system. 2) The data enhancement method for adjusting the boundary of the window can further improve the performance on the basis of the original method. 3) The statistical language model effectively utilizes semantic time sequence related information of the phrases, is beneficial to understanding the meanings of the phrases, and realizes high-precision natural continuous silent voice interaction.
Claims (1)
1. A silent voice decoding method based on facial neck surface myoelectricity is characterized by comprising the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing the nth Chinese phrase in the instruction set P, wherein the N Chinese phrases contain L-class syllables;
the method comprises the steps of collecting surface electromyographic signals generated by facial muscles and neck muscles when a user reads Chinese phrases silently by using a high-density electrode array, and labeling a rest signal segment and a surface electromyographic signal segment corresponding to the phrase in the surface electromyographic signals by using a double-threshold detection method based on short-time energy and zero crossing rate so as to form each phrase signal segment with labels and form a training phrase data set S p ;
Step two, dividing the training by a series of signal windows with time overlapping before and afterPhrase data set S p Obtaining M signal window samples, uniformly dividing each phrase signal segment according to the number of syllables contained in the phrase signal segment, and marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment so as to obtain a batch of training data sets consisting of the M signal window samples with syllable marks;
step three, changing the segmentation time of the signal windows to adjust the window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of training data sets with syllable labelsWherein,represents the k-th batch of training data set with syllable labels, an The mth signal window sample representing the kth batch of data,representing corresponding syllable labels, and adopting one-hot coding to represent,has a size of [1, L];S origin Contains a total of M × K signal window samples;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data;
step 4.2, according to the relative position of the signal channels of the high-density electrode array, converting the surface electromyographic signals acquired by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array, wherein the size of the matrix is marked as [ e, g ];
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas set of all the signal window samples A d-frame three-dimensional electromyogram representing the mth signal window sample of the kth batch of data,the size of (a) is given as [ d, e, g, c ]],Syllable labels representing the mth signal window sample of the kth batch of data;
step five, constructing a deep neural network based on the depicting space-time information, comprising the following steps of: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating circulation unit blocks and A full connection layer, and collecting three-dimensional electromyogram feature map S input Inputting the deep neural network according to K batches;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into the a-th expansion volume block for processing, and outputting the k-th a feature map as Representing the mth signal window sample in the kth batch of three-dimensional electromyogram feature setOutput feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batchInputting the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batchSo that the A-th expanded volume block outputs the final feature map
Step 5.2, the characteristic diagramAfter the flattening layer is processed, obtaining a flattening feature set of the kth batch Wherein,representing the m characteristic diagram of the k batchThe size of the characteristic diagram output after passing through the flattening layer is [ d, e multiplied by g multiplied by H ] a ];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
when a is 1, the flattening feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch Representation characteristic diagramThe gating characteristics output after the processing of the a-th bidirectional gating circulating unit block,size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batchSo that the A-1 th gating characteristic set of the kth batch is output by the A-1 th bidirectional gating circulation unit blockSize of [ d,2 × b ]];
When a is A, the a-1 gating feature set of the k-th batchInputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch Size of [1,2 x b ]];
Step 5.4, enabling the activation functions of the first A-1 full connection layers to adopt Tanh and respectively connect one Dropout layer, wherein the activation function of the A-th full connection layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit blockAfter being processed by A full connection layers in sequence, the scoring matrix of the syllable decision sequence is outputWherein,m signal window samples representing the kth batch of dataAre predicted as probabilities of L syllables, respectively, andwherein,m sample representing kth batch of dataProbability of being predicted as a class j syllable;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
in the formula (1), the reaction mixture is,sample of mth signal window for kth batch of dataCorresponding syllable labelsThe value of the j-th position;
step 5.6, training a neural network:
updating the weight parameters of the deep neural network by adopting an Adam optimizer, setting the maximum iteration time step and dynamically changing the network learning rate lr, and stopping training when the Loss function Loss reaches the minimum or the iteration time is equal to the step, so as to obtain an optimal syllable classification model;
step six, constructing a statistical language model according to the instruction set P of the Chinese phrase so as to post-process the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase;
step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional myoelectric characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into an optimal syllable classification model, and outputting a scoring matrix of a syllable label sequence of the Chinese phrase pWherein,a score probability matrix representing the U-th syllable of the Chinese phrase p', wherein U represents the length of the syllable sequence;
step 6.4, the search depth of each syllable is set as depth, and a multi-cluster search algorithm is utilized to carry out the search on all syllablesIs processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, if so, phrases corresponding to the syllable label sequence which is matched with the syllable label sequence and has the highest score are selected from the syllable label sequencesAnd outputting, otherwise, executing step 6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase pThe syllable with the highest score probability is recorded asThereby obtaining syllable decision sequenceSelecting a syllable decision sequence in a many-to-one mapping relation mapping thetaPhrase with minimum edit distanceAs a result of the decoding of the chinese phrase p'.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210598661.3A CN114999461B (en) | 2022-05-30 | 2022-05-30 | Silent voice decoding method based on surface myoelectricity of face and neck |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210598661.3A CN114999461B (en) | 2022-05-30 | 2022-05-30 | Silent voice decoding method based on surface myoelectricity of face and neck |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114999461A true CN114999461A (en) | 2022-09-02 |
CN114999461B CN114999461B (en) | 2024-05-07 |
Family
ID=83028992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210598661.3A Active CN114999461B (en) | 2022-05-30 | 2022-05-30 | Silent voice decoding method based on surface myoelectricity of face and neck |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114999461B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117084872A (en) * | 2023-09-07 | 2023-11-21 | 中国科学院苏州生物医学工程技术研究所 | Walking aid control method, system and medium based on neck myoelectricity and walking aid |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170095603A (en) * | 2016-02-15 | 2017-08-23 | 인하대학교 산학협력단 | A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing |
CN107545888A (en) * | 2016-06-24 | 2018-01-05 | 常州诗雅智能科技有限公司 | A kind of pharyngeal cavity electronic larynx voice communication system automatically adjusted and method |
CN112151030A (en) * | 2020-09-07 | 2020-12-29 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode-based complex scene voice recognition method and device |
CN113288183A (en) * | 2021-05-20 | 2021-08-24 | 中国科学技术大学 | Silent voice recognition method based on facial neck surface myoelectricity |
-
2022
- 2022-05-30 CN CN202210598661.3A patent/CN114999461B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170095603A (en) * | 2016-02-15 | 2017-08-23 | 인하대학교 산학협력단 | A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing |
CN107545888A (en) * | 2016-06-24 | 2018-01-05 | 常州诗雅智能科技有限公司 | A kind of pharyngeal cavity electronic larynx voice communication system automatically adjusted and method |
CN112151030A (en) * | 2020-09-07 | 2020-12-29 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode-based complex scene voice recognition method and device |
CN113288183A (en) * | 2021-05-20 | 2021-08-24 | 中国科学技术大学 | Silent voice recognition method based on facial neck surface myoelectricity |
Non-Patent Citations (1)
Title |
---|
王旭;贾雪琴;李景宏;杨丹;: "基于优化肌电特征的无声语音信号识别", 东北大学学报(自然科学版), no. 10, 28 October 2006 (2006-10-28) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117084872A (en) * | 2023-09-07 | 2023-11-21 | 中国科学院苏州生物医学工程技术研究所 | Walking aid control method, system and medium based on neck myoelectricity and walking aid |
CN117084872B (en) * | 2023-09-07 | 2024-05-03 | 中国科学院苏州生物医学工程技术研究所 | Walking aid control method, system and medium based on neck myoelectricity and walking aid |
Also Published As
Publication number | Publication date |
---|---|
CN114999461B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Schultz et al. | Modeling coarticulation in EMG-based continuous speech recognition | |
CN113288183B (en) | Silent voice recognition method based on facial neck surface myoelectricity | |
CN102982809B (en) | Conversion method for sound of speaker | |
CN109935243A (en) | Speech-emotion recognition method based on the enhancing of VTLP data and multiple dimensioned time-frequency domain cavity convolution model | |
CN110516696A (en) | It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression | |
CN111462769B (en) | End-to-end accent conversion method | |
Peters | Dimensions of perception for consonants | |
CN107256392A (en) | A kind of comprehensive Emotion identification method of joint image, voice | |
CN107221318A (en) | Oral English Practice pronunciation methods of marking and system | |
CN103366618A (en) | Scene device for Chinese learning training based on artificial intelligence and virtual reality | |
CN109727608A (en) | A kind of ill voice appraisal procedure based on Chinese speech | |
CN110211594A (en) | A kind of method for distinguishing speek person based on twin network model and KNN algorithm | |
CN109841231A (en) | A kind of early stage AD speech auxiliary screening system for standard Chinese | |
CN114999461B (en) | Silent voice decoding method based on surface myoelectricity of face and neck | |
CN108766462B (en) | Voice signal feature learning method based on Mel frequency spectrum first-order derivative | |
CN110348482A (en) | A kind of speech emotion recognition system based on depth model integrated architecture | |
Wand | Advancing electromyographic continuous speech recognition: Signal preprocessing and modeling | |
Pillai et al. | A deep learning based evaluation of articulation disorder and learning assistive system for autistic children | |
CN114863912B (en) | Silent voice decoding method based on surface electromyographic signals | |
Harrington et al. | A physiological analysis of high front, tense-lax vowel pairs in Standard Austrian and Standard German | |
JP5030150B2 (en) | Voice recognition device using myoelectric signal | |
CN114999468A (en) | Speech feature-based speech recognition algorithm and device for aphasia patients | |
JP4110247B2 (en) | Artificial vocalization device using biological signals | |
Karjo | Phonetic and phonotactic analysis of Manggarai language | |
Räsänen | Speech segmentation and clustering methods for a new speech recognition architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |