CN114999461A - Silent voice decoding method based on facial neck surface myoelectricity - Google Patents

Silent voice decoding method based on facial neck surface myoelectricity Download PDF

Info

Publication number
CN114999461A
CN114999461A CN202210598661.3A CN202210598661A CN114999461A CN 114999461 A CN114999461 A CN 114999461A CN 202210598661 A CN202210598661 A CN 202210598661A CN 114999461 A CN114999461 A CN 114999461A
Authority
CN
China
Prior art keywords
syllable
batch
phrase
data
signal window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210598661.3A
Other languages
Chinese (zh)
Other versions
CN114999461B (en
Inventor
张旭
何运宝
陈希
陈香
陈勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202210598661.3A priority Critical patent/CN114999461B/en
Publication of CN114999461A publication Critical patent/CN114999461A/en
Application granted granted Critical
Publication of CN114999461B publication Critical patent/CN114999461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/027Syllables being the recognition units
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention discloses a silent voice decoding method based on the surface electromyography of the face and neck, which decodes the voice content without sounding by processing the collected surface electromyography signals corresponding to the relevant muscle activities in the process of reading by the user, and comprises the following steps: 1. collecting surface electromyographic signals of a user to form a training data set; 2. carrying out data segmentation to obtain a training data set with syllable labels; 3. carrying out data enhancement; 4. carrying out feature extraction on the training data set after data enhancement; 5. constructing a deep neural network for depicting space-time information; 6. and constructing a statistical language model to obtain the prediction of the continuous reading phrase of the user. The invention recognizes the voice content from the finer-grained structure forming the voice sequence, not only can realize high-performance silent voice recognition, but also can be helpful for understanding the meaning of the voice corresponding to the surface myoelectric activity, and provides a new thought for the silent voice recognition method.

Description

Silent voice decoding method based on facial neck surface myoelectricity
Technical Field
The invention belongs to the field of biological signal processing, machine learning and intelligent control, and particularly relates to a silent voice decoding method based on facial neck surface myoelectricity.
Background
Voice is an essential effective and convenient communication mode in human daily life. In the past decades, speech-related human-computer interaction technology, represented by Automatic Speech Recognition (ASR) technology, has developed rapidly and has shown very high performance in general scenarios. However, the disadvantages of ASR are very apparent due to the dependence on voiced speech. If can't guarantee effective work under high noise background, can't satisfy privacy interactive demand to the dysvocation crowd can't rely on ASR to carry out daily interchange.
To overcome the above disadvantages, researchers have explored non-acoustic speech recognition methods. During human speech and reading, the voice-related muscle groups of the face and neck are activated, producing bioelectric signals called surface electromyograms (sEMG). Therefore, Silent Speech Recognition (SSR) based on sEMG has become an important supplementary approach to ASR in some special scenarios. SSR techniques based on sEMG have made some progress over decades. Early SSRs mainly used classical mode classification methods such as support vector machines, conjugate gradient networks, etc.; and recording the sEMG of the face and the neck of the subject by using discrete electrodes with a small number of channels, and identifying the corpus with limited word number. Later research has tended to identify lexical corpora using Hidden Markov Models (HMMs) that characterize sEMG timing information. With the development of data acquisition technology, high-density (HD) electrode arrays are designed to simultaneously record a large number of channel surface myoelectrical signals of a target muscle or group of muscles over a relatively large area. The use of high density surface electromyographic signal (HD-sEMG) arrays helps to capture valuable spatial information, characterize heterogeneity of muscle activity, and thereby improve performance of electromyographic pattern recognition.
While the above studies demonstrate the availability of pattern classification techniques in achieving satisfactory SSR performance, there are still some deficiencies. For example, 1) depending on the pattern classification method, a phrase or word is simply mapped between sEMG pattern features, and semantic information of time-series association is omitted. 2) The performance of classification techniques is limited by the number of vocabularies in the corpus. 3) The common mode classification technology is mainly used for identifying isolated words and cannot realize natural and coherent silent voice interaction.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a silent speech decoding method based on facial neck surface electromyography, so that a finer-grained structure of a speech sequence can be identified and speech content can be understood, the identification performance of phrases with similar pronunciations is improved, and accurate and natural silent speech interaction can be realized finally.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a silent voice decoding method based on facial neck surface myoelectricity, which is characterized by comprising the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing the nth Chinese phrase in the instruction set P, wherein the N Chinese phrases contain L-class syllables;
the method comprises the steps of collecting surface electromyographic signals generated by facial muscles and neck muscles when a user reads Chinese phrases silently by using a high-density electrode array, and labeling a rest signal segment and a surface electromyographic signal segment corresponding to the phrase in the surface electromyographic signals by using a double-threshold detection method based on short-time energy and zero crossing rate so as to form each phrase signal segment with labels and form a training phrase data set S p
Step two, using a series of time weights before and afterSegmenting the training phrase data set S by a stack of signal windows p Obtaining M signal window samples, uniformly dividing each phrase signal segment according to the number of syllables contained in the phrase signal segment, and marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment so as to obtain a batch of training data sets consisting of the M signal window samples with syllable marks;
step three, changing the segmentation time of the signal window to adjust the segmentation window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of training data sets with syllable labels
Figure BDA0003668760300000021
Wherein,
Figure BDA0003668760300000022
represents the k-th batch of training data set with syllable labels, an
Figure BDA0003668760300000023
Figure BDA0003668760300000024
The mth signal window sample representing the kth batch of data,
Figure BDA0003668760300000025
representing corresponding syllable labels, and adopting one-hot coding to represent,
Figure BDA0003668760300000026
has a size of [1, L ]];S origin Contains a total of M × K signal window samples;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data;
step 4.2, according to the relative position of the signal channels of the high-density electrode array, converting the surface electromyographic signals collected by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array, wherein the size of the matrix is marked as [ e, g ];
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas set of all the signal window samples
Figure BDA0003668760300000027
A d-frame three-dimensional electromyogram representing the m-th signal window sample of the kth batch of data,
Figure BDA0003668760300000028
the size of (a) is given as [ d, e, g, c ]],
Figure BDA0003668760300000029
Syllable labels of the mth signal window sample representing the kth batch of data;
step five, constructing a deep neural network based on the depicting space-time information, which comprises the following steps: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating cyclic unit blocks and A full connection layers, and three-dimensional electromyogram characteristic map set S input Inputting the deep neural network according to K batches;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into the a-th expansion volume block for processing, and outputting the k-th a feature map as
Figure BDA0003668760300000031
Representing the mth signal window sample in the kth batch of three-dimensional electromyogram feature set
Figure BDA0003668760300000032
Output feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batch is added
Figure BDA0003668760300000033
Inputting the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batch
Figure BDA0003668760300000034
So that the A-th expanded volume block outputs the final feature map
Figure BDA0003668760300000035
Step 5.2, the characteristic diagram
Figure BDA0003668760300000036
After the flattening layer is processed, obtaining a flattening feature set of the kth batch
Figure BDA0003668760300000037
Figure BDA0003668760300000038
Wherein,
Figure BDA0003668760300000039
representing the m characteristic diagram of the k batch
Figure BDA00036687603000000310
The size of the characteristic diagram output after passing through the flattening layer is [ d, e multiplied by g multiplied by H ] a ];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
when a is 1, the flattening feature set of the k-th batch
Figure BDA00036687603000000311
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating characteristic set of the k-th batch
Figure BDA00036687603000000312
Representation characteristic diagram
Figure BDA00036687603000000313
The gating characteristics output after the processing of the a-th bidirectional gating circulating unit block,
Figure BDA00036687603000000314
size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batch
Figure BDA00036687603000000315
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure BDA00036687603000000316
So that the A-1 gating feature set of the kth batch is output by the A-1 bidirectional gating cycle unit block
Figure BDA00036687603000000317
Size of [ d,2 x b ]];
When a is A, the a-1 gating feature set of the k-th batch
Figure BDA00036687603000000318
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure BDA00036687603000000319
Size of [1,2 x b ]];
Step 5.4, enabling the activation functions of the first A-1 full connection layers to adopt Tanh and respectively connect one Dropout layer, wherein the activation function of the A-th full connection layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit block
Figure BDA00036687603000000320
In turn pass throughAfter A full connection layers are processed, a scoring matrix of a syllable decision sequence is output
Figure BDA00036687603000000321
Wherein,
Figure BDA00036687603000000322
m signal window samples representing the kth batch of data
Figure BDA00036687603000000323
Are predicted as probabilities of L syllables, respectively, and
Figure BDA00036687603000000324
wherein,
Figure BDA00036687603000000325
m sample representing kth batch of data
Figure BDA00036687603000000326
Probability of being predicted as a class j syllable;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
Figure BDA00036687603000000327
in the formula (1), the reaction mixture is,
Figure BDA00036687603000000328
sample of mth signal window for kth batch of data
Figure BDA00036687603000000329
Corresponding syllable labels
Figure BDA00036687603000000330
The value of the j-th position;
step 5.6, training a neural network:
updating the weight parameters of the deep neural network by adopting an Adam optimizer, setting the maximum iteration time step and dynamically changing the network learning rate lr, and stopping training when the Loss function Loss reaches the minimum or the iteration time is equal to the step, so as to obtain an optimal syllable classification model;
step six, constructing a statistical language model according to the instruction set P of the Chinese phrases, and then carrying out post-processing on the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase;
step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional electromyography characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into an optimal syllable classification model, and outputting a scoring matrix of a syllable label sequence of the Chinese phrase p
Figure BDA0003668760300000041
Wherein,
Figure BDA0003668760300000042
a score probability matrix representing the U-th syllable of the Chinese phrase p', U representing the length of the syllable sequence;
step 6.4, the search depth of each syllable is set as depth, and a multi-cluster search algorithm is utilized to carry out the search on all syllables
Figure BDA0003668760300000043
Is processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, if so, phrases corresponding to the syllable label sequence which is matched with the syllable label sequence and has the highest score are selected from the syllable label sequences
Figure BDA0003668760300000044
And output, otherwise, execute6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase p
Figure BDA0003668760300000045
The syllable with the highest score probability is recorded as
Figure BDA0003668760300000046
Thereby obtaining syllable decision sequence
Figure BDA0003668760300000047
Selecting a syllable decision sequence in a many-to-one mapping relation mapping theta
Figure BDA0003668760300000048
Phrase with minimum edit distance
Figure BDA0003668760300000049
As a result of the decoding of the chinese phrase p'.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention divides data based on the original phrase sEMG data to obtain fine-grained syllable surface electromyographic data, establishes a neural network (DC-BiGRU) of an expanded convolution bidirectional gating cyclic unit as a classifier, further provides a statistical language model to depict semantic information, refines and corrects the output of the trained classifier, obtains accurate prediction of phrase sequences, and realizes accurate and natural silent speech recognition through a decoding framework.
2. The invention is beneficial to simplifying the preparation process of the training data through the automatic marking method of the training data; meanwhile, the invention provides a data enhancement method based on adjustment of the windowing boundary, which effectively relieves the over-fitting phenomenon of a deep network and improves the performance of silent speech recognition.
3. The invention starts research at fine granularity level, provides a statistical language model based on multi-cluster searching and editing distance, and utilizes semantic time sequence correlation information of phrase sets to help to understand the meanings of phrases and improve the recognition performance of similar pronunciation phrases.
4. The invention realizes high-performance natural continuous silent speech recognition according to the data processing requirement of a real-time system, thereby being beneficial to the practical application of the method in the fields of myoelectricity control and the like.
Drawings
FIG. 1 is a flow chart of a method for decoding a silent voice based on facial neck surface electromyography according to the present invention;
FIG. 2 is a set of Chinese pronunciation phrases according to the present invention;
FIG. 3 is an illustration of the shape parameters and placement position of a face and neck high density electrode array used in the present invention;
FIG. 4 is a schematic diagram of a data segmentation, data automatic labeling and data enhancement method adopted by the present invention;
FIG. 5 is a schematic illustration of the spatial position distribution and stitching results in a high density electrode array according to the present invention;
FIG. 6 is a schematic diagram of the structure of a classification network based on the extended convolution bi-directional gated cyclic unit (DC-BiGRU) used in the present invention;
FIG. 7 is a graph of the average phrase recognition rate and the standard deviation score obtained by the present invention;
FIG. 8a is a schematic diagram of a confusion matrix based on DC-BiGRU phrase classification obtained by the present invention;
fig. 8b is a schematic diagram of the confusion matrix obtained by the present invention based on the proposed DCBiMEP decoding method.
Detailed Description
In this embodiment, a method for decoding a unvoiced sound based on facial neck surface electromyography extracts time-series related semantic information using a statistical language model, which not only improves recognition performance of phrases with similar pronunciation, but also helps to understand meaning of phrases corresponding to sEMG activity, and provides a new idea for the unvoiced sound recognition method, and specifically, as shown in fig. 1, includes the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing in instruction set PThe nth Chinese phrase, wherein the N Chinese phrases contain L-class syllables; as shown in fig. 2, the chinese pronunciation vocabulary consists of N-30 phrases, including 79 chinese syllables and 1 resting syllables, L-80;
the method comprises the steps of collecting surface electromyographic signals generated by facial and neck muscles when a user reads Chinese phrases acquiescently by using a high-density electrode array, and labeling a rest signal segment in the surface electromyographic signals and a surface electromyographic signal segment corresponding to the phrases by using a double-threshold detection method based on short-time energy and zero crossing rate, so that each phrase signal segment with labels is formed and a training phrase data set Sp is formed. In this example, a total of 8 healthy subjects aged 21-26 years, having no hearing or speech impairment, were collected from 7 males and one female for the data collection experiment. Each subject was specifically informed for each experimental procedure and specific requirements.
The high density electrode array shape parameters and placement position are shown in fig. 3. The total number of the high-density flexible electrode arrays is four, and two symmetrical arrays are respectively arranged on the left side and the right side of the face and the neck. Illustratively, the number of channels of the two facial electrode arrays is 16, the diameter of the electrodes is 5mm, and the electrode spacing ranges from 10 mm to 15 mm to 18 mm; the number of channels of the two neck electrode arrays is also 16, the diameter of the electrodes is 5mm, and the distance between the electrodes is 18 mm. The array of face-neck electrodes collectively comprises a 64-channel array. In addition, one electrode is respectively attached to the back of the left and right ears and is used as a reference electrode and a ground electrode;
prior to application of the electrode array, the subject's facial and neck target muscles were scrubbed with an alcohol cotton pad to clean the skin keratin while a suitable amount of conductive gel was applied to the electrode probes to reduce the skin impedance. Illustratively, the facial electrode array is used for collecting sEMG of facial muscles such as zygomatic muscles, masseter muscles and inferior labial muscles, and the neck electrode array is used for collecting sEMG of neck muscles such as scapula-hyoid muscles, sternohyoid muscles and platysma muscles. During the collection process, the subjects silently express each phrase at a uniform speed with medium strength, each phrase is repeated 20 times as a test, the interval time of each repetition of the phrase is t, and the t is set to be 3s exemplarily. In each test, activities unrelated to the collection task, such as swallowing saliva and coughing, were not allowed. To avoid muscle fatigue in the subject, there was a rest period of T between trials, illustratively, T was taken to be 30 s;
the sEMG activity detection is performed on the raw data, and the result is shown in a part a in fig. 4. Firstly, calculating the short-time energy and zero crossing rate of a baseline in a resting state as initial energy and initial zero crossing rate, and recording as E _ i and C _ i; the short-time sample is used for calculating the short-time energy and the zero crossing rate of the original data, and the length of the short-time sample is S _ length. The method needs to set three thresholds, wherein the first two thresholds are high and low thresholds set by short-term energy values and are respectively marked as E _ h and E _ l, and are used for carrying out initial judgment on an initial position and an offset position; the third is a threshold SC of short-term zero crossing rate, illustratively, S _ length is set to 64ms, E _ h is set to 8 × E _ i, E _ l is set to 3 × E _ i, SC is set to 3 × C _ i; obtaining a rest signal section and a mark of a signal section corresponding to each phrase;
step two, segmenting the training phrase data set S by a series of signal windows with time overlapping before and after p Obtaining M signal window samples; as shown in part b of fig. 4, the sliding window length for data partitioning is W _ length, the Overlap ratio is overlay, and illustratively, W _ length is set to 1000ms and overlay is set to 50%; dividing all the phrase signal segments according to the number of syllables contained in the phrase signal segments, and then marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment, wherein the result is shown as part b in fig. 4; according to the time span of the signal window sample under different syllable labels, labeling the signal window with corresponding syllable labels to obtain a training data set with syllable labels;
step three, changing the segmentation time of the signal windows to adjust the window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of trainings with syllable labelsExercise data set
Figure BDA0003668760300000071
Wherein,
Figure BDA0003668760300000072
represents the k-th batch of training data set with syllable labels, an
Figure BDA0003668760300000073
Figure BDA0003668760300000074
The mth signal window sample representing the kth batch of data,
Figure BDA0003668760300000075
representing corresponding syllable labels, and adopting one-hot coding to represent, and its size is [1, L];S origin Contains a total of M × K signal window samples; in this embodiment, as shown in part c in fig. 4, the initial position of each data division is shifted backward by Δ/5 on the basis of the previous batch, and K batches of signal window samples with labels are obtained by the above syllable labeling method, where M is 327, K is 5, and Δ is set to 500 ms;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data; in this embodiment, the frame length of consecutive non-overlapping frames is F _ length ═ 40 ms;
step 4.2, as shown in fig. 5, converting the surface electromyographic signals acquired by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array according to the relative position of the signal channel of the high-density electrode array, wherein the size of the surface electromyographic data matrix is marked as [ e, g ]; in the present embodiment, e ═ 8 and g ═ 8 are set;
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas of all the signal window samples
Figure BDA0003668760300000076
A d-frame three-dimensional electromyogram representing the mth signal window sample of the kth batch of data,
Figure BDA0003668760300000077
the size of (a) is given as [ d, e, g, c ]],
Figure BDA0003668760300000078
Syllable labels representing mth signal window samples of the kth data; in this embodiment, c is 4, the 4 extracted myoelectric time-domain features are Mean Absolute Value (MAV), Waveform Length (WL), zero cross point (ZC), and slope sign number (SSC), the number of frames of the feature map of each signal window is d is 25, and the feature map size is [25,8,8,4 ═ 25]Finally, a database S formed by the characteristic diagrams of all the signal window samples is obtained input As input to the neural network.
Step five, constructing a deep neural network based on the depicting space-time information, comprising the following steps of: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating circulation unit blocks and A full connection layer, and collecting three-dimensional electromyogram feature map S input Inputting the deep neural network according to K batches; as shown in fig. 6, the deep neural network characterizing spatio-temporal information is composed of a number of expanded convolution blocks including a time distribution layer, a flattening layer, a number of bidirectional gated cyclic unit blocks, and a full connection layer; in the present embodiment, a ═ 2;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into an a-th expansion volume block for processing, and outputting the k-th a feature map of the k-th batch
Figure BDA0003668760300000079
Representing the mth signal window sample in the kth batch of three-dimensional electromyogram feature set
Figure BDA0003668760300000081
Output feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batch
Figure BDA0003668760300000082
Inputting the data into the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batch
Figure BDA0003668760300000083
So that the A-th expanded volume block outputs the final feature map
Figure BDA0003668760300000084
In this embodiment, the first expanded convolution layer is composed of H 1 32 filters of 3 × 3 with a spreading factor of 1, the second layer of extended convolutional layers consisting of H 2 With a spreading factor of 3, two Dropout layer ratios of 0.5 for 8 filters of 3 × 3;
Figure BDA0003668760300000085
size of [25,8,8,32 ]],
Figure BDA0003668760300000086
Size of [25,8,8,8 ]];
Step 5.2, feature map
Figure BDA0003668760300000087
After the treatment of the flattening layer, the flattening feature set of the kth batch is obtained
Figure BDA0003668760300000088
Figure BDA0003668760300000089
Wherein,
Figure BDA00036687603000000810
representing the m characteristic diagram of the k batch
Figure BDA00036687603000000811
The feature diagram output after the flattening is in the size of [ d, e × g × H a ](ii) a In the present embodiment, the first and second electrodes are,
Figure BDA00036687603000000812
size of [25,512];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
set of flattening features for lot k when a is 1
Figure BDA00036687603000000813
Inputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure BDA00036687603000000814
Representation characteristic diagram
Figure BDA00036687603000000815
The gating characteristics output after the processing of the a-th bidirectional gating circulating cell block,
Figure BDA00036687603000000816
size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batch
Figure BDA00036687603000000817
Inputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure BDA00036687603000000818
Thereby the A-1 st bidirectional gating circulation listMetablock outputs the A-1 gating feature set of the kth batch
Figure BDA00036687603000000819
Size of [ d,2 x b ]];
When a is A, the kth batch of a-1 gating feature set
Figure BDA00036687603000000820
Inputting the a-th bidirectional gating circulation unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure BDA00036687603000000821
Size of [1,2 x b ]](ii) a In this embodiment, each bidirectional gated cyclic unit block includes 1 bidirectional gated cyclic unit layer using a ReLU activation function and 1 Dropout layer, the hidden node dimensions of the two layers of bidirectional gated cyclic units are both b ═ 64, and the Dropout ratio is 0.4;
Figure BDA00036687603000000822
size of [25,128],
Figure BDA00036687603000000823
Size of [1,128 ]];
Step 5.4, the activation functions of the first A-1 fully-connected layers adopt Tanh and are respectively connected with one Dropout layer, and the activation function of the A-th fully-connected layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit block
Figure BDA00036687603000000824
After being processed by A full connection layers in sequence, the scoring matrix of the syllable decision sequence is output
Figure BDA00036687603000000825
Wherein,
Figure BDA00036687603000000826
m signal window samples representing the kth batch of data
Figure BDA00036687603000000827
Are predicted as probabilities of L syllables, respectively, and
Figure BDA00036687603000000828
wherein
Figure BDA00036687603000000829
M-th sample representing the kth batch of data from the network
Figure BDA00036687603000000830
The probability of predicting a class j syllable; in this embodiment, Tanh is adopted as an activation function of the 1 st fully-connected layer, the dimension of the hidden node layer is 200, 1 Dropout layer with a ratio of 0.2 is connected, and the dimension of the hidden node of the 2 nd fully-connected layer is 80;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
Figure BDA0003668760300000091
in the formula (1), the reaction mixture is,
Figure BDA0003668760300000092
sample of mth signal window for kth batch of data
Figure BDA0003668760300000093
Corresponding syllable labels
Figure BDA0003668760300000094
The value of the j-th position; in this embodiment, the one-hot coding length is 80, only one position has a value of 1, the rest are 0, each batch contains M samples, and the loss function is obtained by cross entropy weighted summation of K batches of samples;
step 5.6, training a neural network:
updating weight parameters of a deep neural network by adopting an Adam optimizer, setting maximum iteration times step and dynamically changing a network learning rate lr, and stopping training when a Loss function Loss reaches the minimum or the iteration times is equal to step so as to obtain an optimal syllable classification model; in this embodiment, step is 300, the initial learning rate lr is 0.01, and the learning rate lr becomes 0.1 × lr every 100 iterations.
Step six, constructing a statistical language model according to the instruction set P of the Chinese phrases, and then carrying out post-processing on the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase; in this embodiment, the speech rates of different subjects are different, and it is difficult to ensure that the speech rates of the same phrase are the same when the same phrase is read by default repeatedly, resulting in different numbers of signal window samples of the same phrase, and therefore, the syllable label sequence to phrase is a many-to-one mapping.
Step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional electromyography characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into the optimal syllable classification model, and outputting a scoring matrix of the syllable label sequence of the Chinese phrase p
Figure BDA0003668760300000095
Wherein,
Figure BDA0003668760300000096
a score probability matrix representing the U-th syllable of the Chinese phrase p', U representing the length of the syllable sequence;
step 6.4, the search depth of each syllable is set to depth, and a multi-cluster search algorithm is utilized to carry out the depth matching
Figure BDA0003668760300000097
Is processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, and if so, the syllable label sequence is matched with the many-to-one mapping relation thetaSuccessfully, the phrase corresponding to the syllable label sequence which is matched and has the highest score is selected from the syllable label sequences
Figure BDA0003668760300000098
And outputting, otherwise, executing step 6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase p
Figure BDA0003668760300000099
The syllable with the highest score probability is recorded as
Figure BDA00036687603000000910
Thereby obtaining syllable decision sequence
Figure BDA00036687603000000911
Selecting a sequence of syllable labels in a many-to-one mapping relation mapping theta
Figure BDA00036687603000000912
Phrase with minimum edit distance
Figure BDA00036687603000000913
As a result of the decoding of the chinese phrase p'.
In this embodiment, depth is set to 5, and the phrase corresponding to the syllable tag sequence with the highest score is selected from the tag sequences successfully matched with the many-to-one mapping relation θ by the formula (2)
Figure BDA0003668760300000101
Figure BDA0003668760300000102
In the formula (2), the reaction mixture is,
Figure BDA0003668760300000103
phrase indicating matching success
Figure BDA0003668760300000104
Corresponding score, max {. to } returns the phrase corresponding to the maximum score
Figure BDA0003668760300000105
Phrase
Figure BDA0003668760300000106
Represents a phrase in the phrase instruction set P. Obtaining syllable label sequence from formula (3)
Figure BDA0003668760300000107
In the formula (3), argmax {. cndot.) returns the syllable with the highest score for each syllable score matrix.
Figure BDA0003668760300000108
In this embodiment, to quantitatively evaluate the effect of the present invention, the decoding method of the present invention is compared with the conventional classification method, and is denoted as DCBiMEP. In the comparison experiment, four common phrase classification methods are adopted to be compared with the DCBiMEP of the invention. The four classification methods are respectively labeled as HMM, extended convolutional neural network (DCNN), bidirectional gated round robin unit (BiGRU), and DC-BiGRU. The data preparation process of the four methods is as follows: and carrying out sEMG activity detection on the original electromyographic data, extracting sEMG activity data corresponding to each phrase, marking the corresponding phrase tag, and carrying out feature extraction on the tagged phrase data to obtain feature data of all phrases. In addition, in order to verify the effectiveness of data enhancement on the method, the method deduces two methods according to whether the data enhancement is carried out, and the two methods are respectively marked as DCBiMEP and AUG-DCBiMEP which represent the method after the data enhancement is carried out. Fig. 7 shows the results of Phrase Recognition Accuracy (PRA) of the above 6 methods on the data of 8 subjects, and the PRAs of the 4 conventional phrase classification methods were (82.74 ± 7.48)%, (83.06 ± 7.31)%, (87.92 ± 5.82)% and (90.49 ± 5.47)%, respectively, and it can be seen that the DC-BiGRU performance characterizing the spatio-temporal information is the best. The PRA of DCBiMEP of the method is (97.27 +/-1.44)%, and the performance is obviously superior to that of 4 comparison phrase classification methods. The PRA of AUG-DCBiMEP is improved by 0.91 percent on the basis of the method of the invention to reach (98.18 +/-1.44)%, and the effectiveness of the data enhancement on the method of the invention is proved.
Fig. 8a and 8b show the phrase recognition confusion matrix on subject 2 data for the DC-BiGRU and the method of the present invention that performed best in the 4-class comparison method. It is evident that DC-BiGRU does not perform as well as the inventive method for the recognition of similar phrases in pronunciation, such as "slow down" and "speed up" and "turn left" and "turn right".
In combination with the above comparative experiments and recognition results, the following conclusions can be drawn, including: 1) the decoding method provided by the invention can efficiently identify phrases with similar pronunciation, and improve the performance of a silent voice system. 2) The data enhancement method for adjusting the boundary of the window can further improve the performance on the basis of the original method. 3) The statistical language model effectively utilizes semantic time sequence related information of the phrases, is beneficial to understanding the meanings of the phrases, and realizes high-precision natural continuous silent voice interaction.

Claims (1)

1. A silent voice decoding method based on facial neck surface myoelectricity is characterized by comprising the following steps:
step one, constructing an instruction set P ═ P containing N Chinese phrases 1 ,…,p n ,…,p N },p n Representing the nth Chinese phrase in the instruction set P, wherein the N Chinese phrases contain L-class syllables;
the method comprises the steps of collecting surface electromyographic signals generated by facial muscles and neck muscles when a user reads Chinese phrases silently by using a high-density electrode array, and labeling a rest signal segment and a surface electromyographic signal segment corresponding to the phrase in the surface electromyographic signals by using a double-threshold detection method based on short-time energy and zero crossing rate so as to form each phrase signal segment with labels and form a training phrase data set S p
Step two, dividing the training by a series of signal windows with time overlapping before and afterPhrase data set S p Obtaining M signal window samples, uniformly dividing each phrase signal segment according to the number of syllables contained in the phrase signal segment, and marking fine-grained syllables on each signal window sample by combining the syllable sequence of each phrase signal segment so as to obtain a batch of training data sets consisting of the M signal window samples with syllable marks;
step three, changing the segmentation time of the signal windows to adjust the window boundary of each signal window, and then processing according to the process of the step two, thereby obtaining K batches of training data sets with syllable labels
Figure FDA0003668760290000011
Wherein,
Figure FDA0003668760290000012
represents the k-th batch of training data set with syllable labels, an
Figure FDA0003668760290000013
Figure FDA0003668760290000014
The mth signal window sample representing the kth batch of data,
Figure FDA0003668760290000015
representing corresponding syllable labels, and adopting one-hot coding to represent,
Figure FDA0003668760290000016
has a size of [1, L];S origin Contains a total of M × K signal window samples;
step four, extracting a training data set S origin Myoelectric characteristics of (2):
4.1, using continuous non-overlapping frames to perform segmentation processing on each signal window sample to obtain d frames of signal window data;
step 4.2, according to the relative position of the signal channels of the high-density electrode array, converting the surface electromyographic signals acquired by the high-density electrode array into a surface electromyographic data matrix of a two-dimensional electrode channel array, wherein the size of the matrix is marked as [ e, g ];
4.3, extracting c electromyographic features of each frame of signal window data to obtain a three-dimensional electromyographic feature map of each frame; further obtaining the three-dimensional electromyogram characteristic atlas set of all the signal window samples
Figure FDA0003668760290000017
Figure FDA0003668760290000018
A d-frame three-dimensional electromyogram representing the mth signal window sample of the kth batch of data,
Figure FDA0003668760290000019
the size of (a) is given as [ d, e, g, c ]],
Figure FDA00036687602900000110
Syllable labels representing the mth signal window sample of the kth batch of data;
step five, constructing a deep neural network based on the depicting space-time information, comprising the following steps of: a expansion volume blocks containing time distribution layer, flattening layer, A bidirectional gating circulation unit blocks and A full connection layer, and collecting three-dimensional electromyogram feature map S input Inputting the deep neural network according to K batches;
step 5.1, any a-th expansion convolution block comprises an expansion convolution layer, a batch normalization layer and a Dropout layer; and the a-th expanded convolution layer adopts H a Two-dimensional convolution kernels with dimensions of h x h and a Tanh activation function are adopted;
when a is 1, inputting the k-th three-dimensional electromyogram feature map set into the a-th expansion volume block for processing, and outputting the k-th a feature map as
Figure FDA0003668760290000021
Figure FDA0003668760290000022
Representing the mth signal window sample in the kth batch of three-dimensional electromyogram feature set
Figure FDA0003668760290000023
Output feature map with dimensions [ d, e, g, H ] a ];
When a is 2,3, …, A, the a-1 characteristic diagram of the k batch
Figure FDA0003668760290000024
Inputting the a-th expanded volume block for processing, and outputting the a-th feature map of the k-th batch
Figure FDA0003668760290000025
So that the A-th expanded volume block outputs the final feature map
Figure FDA0003668760290000026
Step 5.2, the characteristic diagram
Figure FDA0003668760290000027
After the flattening layer is processed, obtaining a flattening feature set of the kth batch
Figure FDA0003668760290000028
Figure FDA0003668760290000029
Wherein,
Figure FDA00036687602900000210
representing the m characteristic diagram of the k batch
Figure FDA00036687602900000211
The size of the characteristic diagram output after passing through the flattening layer is [ d, e multiplied by g multiplied by H ] a ];
Step 5.3, any a-th bidirectional gating cycle unit block comprises a bidirectional gating cycle unit layer adopting a ReLU activation function and a Dropout layer, and the dimensions of hidden nodes in the bidirectional gating cycle unit layer are all b;
when a is 1, the flattening feature set of the k-th batch
Figure FDA00036687602900000212
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure FDA00036687602900000213
Figure FDA00036687602900000214
Representation characteristic diagram
Figure FDA00036687602900000215
The gating characteristics output after the processing of the a-th bidirectional gating circulating unit block,
Figure FDA00036687602900000216
size of [ d,2 x b ]];
When a is 2,3, …, A-1, the a-1 feature set of the k-th batch
Figure FDA00036687602900000217
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure FDA00036687602900000218
So that the A-1 th gating characteristic set of the kth batch is output by the A-1 th bidirectional gating circulation unit block
Figure FDA00036687602900000219
Size of [ d,2 × b ]];
When a is A, the a-1 gating feature set of the k-th batch
Figure FDA00036687602900000220
Inputting the a-th bidirectional gating cycle unit block for processing, and outputting the k-th gating feature set of the k-th batch
Figure FDA00036687602900000221
Figure FDA00036687602900000222
Size of [1,2 x b ]];
Step 5.4, enabling the activation functions of the first A-1 full connection layers to adopt Tanh and respectively connect one Dropout layer, wherein the activation function of the A-th full connection layer is softmax;
gating characteristic set output by A-th bidirectional gating circulation unit block
Figure FDA00036687602900000223
After being processed by A full connection layers in sequence, the scoring matrix of the syllable decision sequence is output
Figure FDA00036687602900000224
Wherein,
Figure FDA00036687602900000225
m signal window samples representing the kth batch of data
Figure FDA00036687602900000226
Are predicted as probabilities of L syllables, respectively, and
Figure FDA00036687602900000227
wherein,
Figure FDA00036687602900000228
m sample representing kth batch of data
Figure FDA00036687602900000229
Probability of being predicted as a class j syllable;
step 5.5, establishing a cross entropy Loss function Loss by using the formula (1):
Figure FDA00036687602900000230
in the formula (1), the reaction mixture is,
Figure FDA00036687602900000231
sample of mth signal window for kth batch of data
Figure FDA00036687602900000232
Corresponding syllable labels
Figure FDA00036687602900000233
The value of the j-th position;
step 5.6, training a neural network:
updating the weight parameters of the deep neural network by adopting an Adam optimizer, setting the maximum iteration time step and dynamically changing the network learning rate lr, and stopping training when the Loss function Loss reaches the minimum or the iteration time is equal to the step, so as to obtain an optimal syllable classification model;
step six, constructing a statistical language model according to the instruction set P of the Chinese phrase so as to post-process the optimal syllable classifier result:
6.1, establishing a many-to-one mapping relation theta from the syllable label sequence to the Chinese phrase;
step 6.2, processing a Chinese phrase p' to be decoded according to the process of the step two to obtain U signal window samples to be decoded with syllable labels; processing the U signal window samples to be decoded according to the process of the fourth step to obtain a three-dimensional myoelectric characteristic atlas to be decoded;
6.3, inputting the three-dimensional electromyographic feature atlas to be decoded into an optimal syllable classification model, and outputting a scoring matrix of a syllable label sequence of the Chinese phrase p
Figure FDA0003668760290000031
Wherein,
Figure FDA0003668760290000032
a score probability matrix representing the U-th syllable of the Chinese phrase p', wherein U represents the length of the syllable sequence;
step 6.4, the search depth of each syllable is set as depth, and a multi-cluster search algorithm is utilized to carry out the search on all syllables
Figure FDA0003668760290000033
Is processed to obtain U depth Syllable label sequence and U depth Each score;
step 6.5, judge U depth Whether the syllable label sequence is successfully matched with the many-to-one mapping relation theta or not is judged, if so, phrases corresponding to the syllable label sequence which is matched with the syllable label sequence and has the highest score are selected from the syllable label sequences
Figure FDA0003668760290000034
And outputting, otherwise, executing step 6.6;
step 6.6, scoring matrix from u-th syllable of Chinese phrase p
Figure FDA0003668760290000035
The syllable with the highest score probability is recorded as
Figure FDA0003668760290000036
Thereby obtaining syllable decision sequence
Figure FDA0003668760290000037
Selecting a syllable decision sequence in a many-to-one mapping relation mapping theta
Figure FDA0003668760290000038
Phrase with minimum edit distance
Figure FDA0003668760290000039
As a result of the decoding of the chinese phrase p'.
CN202210598661.3A 2022-05-30 2022-05-30 Silent voice decoding method based on surface myoelectricity of face and neck Active CN114999461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210598661.3A CN114999461B (en) 2022-05-30 2022-05-30 Silent voice decoding method based on surface myoelectricity of face and neck

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210598661.3A CN114999461B (en) 2022-05-30 2022-05-30 Silent voice decoding method based on surface myoelectricity of face and neck

Publications (2)

Publication Number Publication Date
CN114999461A true CN114999461A (en) 2022-09-02
CN114999461B CN114999461B (en) 2024-05-07

Family

ID=83028992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210598661.3A Active CN114999461B (en) 2022-05-30 2022-05-30 Silent voice decoding method based on surface myoelectricity of face and neck

Country Status (1)

Country Link
CN (1) CN114999461B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117084872A (en) * 2023-09-07 2023-11-21 中国科学院苏州生物医学工程技术研究所 Walking aid control method, system and medium based on neck myoelectricity and walking aid

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170095603A (en) * 2016-02-15 2017-08-23 인하대학교 산학협력단 A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing
CN107545888A (en) * 2016-06-24 2018-01-05 常州诗雅智能科技有限公司 A kind of pharyngeal cavity electronic larynx voice communication system automatically adjusted and method
CN112151030A (en) * 2020-09-07 2020-12-29 中国人民解放军军事科学院国防科技创新研究院 Multi-mode-based complex scene voice recognition method and device
CN113288183A (en) * 2021-05-20 2021-08-24 中国科学技术大学 Silent voice recognition method based on facial neck surface myoelectricity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170095603A (en) * 2016-02-15 2017-08-23 인하대학교 산학협력단 A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing
CN107545888A (en) * 2016-06-24 2018-01-05 常州诗雅智能科技有限公司 A kind of pharyngeal cavity electronic larynx voice communication system automatically adjusted and method
CN112151030A (en) * 2020-09-07 2020-12-29 中国人民解放军军事科学院国防科技创新研究院 Multi-mode-based complex scene voice recognition method and device
CN113288183A (en) * 2021-05-20 2021-08-24 中国科学技术大学 Silent voice recognition method based on facial neck surface myoelectricity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王旭;贾雪琴;李景宏;杨丹;: "基于优化肌电特征的无声语音信号识别", 东北大学学报(自然科学版), no. 10, 28 October 2006 (2006-10-28) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117084872A (en) * 2023-09-07 2023-11-21 中国科学院苏州生物医学工程技术研究所 Walking aid control method, system and medium based on neck myoelectricity and walking aid
CN117084872B (en) * 2023-09-07 2024-05-03 中国科学院苏州生物医学工程技术研究所 Walking aid control method, system and medium based on neck myoelectricity and walking aid

Also Published As

Publication number Publication date
CN114999461B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Schultz et al. Modeling coarticulation in EMG-based continuous speech recognition
CN113288183B (en) Silent voice recognition method based on facial neck surface myoelectricity
CN102982809B (en) Conversion method for sound of speaker
CN109935243A (en) Speech-emotion recognition method based on the enhancing of VTLP data and multiple dimensioned time-frequency domain cavity convolution model
CN110516696A (en) It is a kind of that emotion identification method is merged based on the adaptive weighting bimodal of voice and expression
CN111462769B (en) End-to-end accent conversion method
Peters Dimensions of perception for consonants
CN107256392A (en) A kind of comprehensive Emotion identification method of joint image, voice
CN107221318A (en) Oral English Practice pronunciation methods of marking and system
CN103366618A (en) Scene device for Chinese learning training based on artificial intelligence and virtual reality
CN109727608A (en) A kind of ill voice appraisal procedure based on Chinese speech
CN110211594A (en) A kind of method for distinguishing speek person based on twin network model and KNN algorithm
CN109841231A (en) A kind of early stage AD speech auxiliary screening system for standard Chinese
CN114999461B (en) Silent voice decoding method based on surface myoelectricity of face and neck
CN108766462B (en) Voice signal feature learning method based on Mel frequency spectrum first-order derivative
CN110348482A (en) A kind of speech emotion recognition system based on depth model integrated architecture
Wand Advancing electromyographic continuous speech recognition: Signal preprocessing and modeling
Pillai et al. A deep learning based evaluation of articulation disorder and learning assistive system for autistic children
CN114863912B (en) Silent voice decoding method based on surface electromyographic signals
Harrington et al. A physiological analysis of high front, tense-lax vowel pairs in Standard Austrian and Standard German
JP5030150B2 (en) Voice recognition device using myoelectric signal
CN114999468A (en) Speech feature-based speech recognition algorithm and device for aphasia patients
JP4110247B2 (en) Artificial vocalization device using biological signals
Karjo Phonetic and phonotactic analysis of Manggarai language
Räsänen Speech segmentation and clustering methods for a new speech recognition architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant