WO2020192231A1 - 一种基于表面肌电唇语识别的辅助沟通系统 - Google Patents
一种基于表面肌电唇语识别的辅助沟通系统 Download PDFInfo
- Publication number
- WO2020192231A1 WO2020192231A1 PCT/CN2019/130814 CN2019130814W WO2020192231A1 WO 2020192231 A1 WO2020192231 A1 WO 2020192231A1 CN 2019130814 W CN2019130814 W CN 2019130814W WO 2020192231 A1 WO2020192231 A1 WO 2020192231A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- module
- lip
- emg
- lip language
- signal
- Prior art date
Links
- 238000002567 electromyography Methods 0.000 title claims abstract description 89
- 238000004891 communication Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 66
- 238000001514 detection method Methods 0.000 claims abstract description 25
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 18
- 238000007635 classification algorithm Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 210000003205 muscle Anatomy 0.000 claims description 24
- 230000003993 interaction Effects 0.000 claims description 12
- 230000001815 facial effect Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 7
- 238000012937 correction Methods 0.000 claims description 6
- 238000004070 electrodeposition Methods 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000000034 method Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000001097 facial muscle Anatomy 0.000 description 2
- 210000004237 neck muscle Anatomy 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 206010013952 Dysphonia Diseases 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002161 motor neuron Anatomy 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 208000011293 voice disease Diseases 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/389—Electromyography [EMG]
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7225—Details of analog processing, e.g. isolation amplifier, gain or sensitivity adjustment, filtering, baseline or drift compensation
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/015—Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/02—Preprocessing
- G06F2218/04—Denoising
- G06F2218/06—Denoising by applying a scale-space analysis, e.g. using wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
- G06F2218/16—Classification; Matching by matching signal segments
Definitions
- the invention belongs to the technical field of speech recognition auxiliary communication, and in particular relates to an auxiliary communication system based on surface electromyography lip language recognition.
- Pronunciation is the basis of language expression. Pronunciation is a very complex process in which the central nervous system controls the coordinated movement of muscles. It is the result of the coordination and cooperation of multiple organs and multiple muscle groups. During the pronunciation, the facial muscles and neck muscles will move accordingly. Different sounds will have different movement patterns of the corresponding facial and neck muscles. Therefore, the electrical signals of the surface muscles of the face and neck can be collected, and through feature extraction and classification, different pronunciations can be correlated with the electrophysiological changes of different muscle groups, thereby identifying pronunciation information and assisting patients in communicating with others.
- Surface EMG signal is a one-dimensional voltage time series signal obtained by the bioelectric changes produced by the muscular system during voluntary and involuntary activities through surface electrode guidance, amplification, display and recording, reflecting the bioelectric activity of motor neurons It is formed in the sum of time and space of many peripheral motor unit potentials. It has a greater correlation with muscle activity. To a certain extent, it can reflect the activity level of related muscles. Therefore, the correlation can be observed by analyzing the surface EMG. The movement of the muscles.
- Surface EMG as an objective and quantitative means, has the advantages of non-invasive, simple operation, low cost and can provide quantitative and qualitative analysis, so it is widely used in medical research, human-computer interaction and other fields.
- EMG acquisition often uses only a few electrodes to be placed on several known articulator muscles.
- the number and positions of the electrodes are all Human subjective selection, the number of electrodes and the number of channels selected is not necessarily the optimal solution, there are certain limitations, and the accuracy of lip recognition is low.
- the embodiment of the present invention provides an auxiliary communication system based on surface electromyography lip language recognition for patients who have difficulty in pronunciation but can express in oral and lip language, so as to solve the problem of subjective selection of electrodes in the prior art.
- the number and location are difficult to obtain the optimal solution, and the accuracy of speech signal recognition is low.
- the training subsystem is used to collect facial and neck electromyographic signals during lip language movements through high-density array electrodes, improve signal quality through signal preprocessing algorithms, classify lip language movement types through classification algorithms, and use channel selection algorithms Select the optimal number of electrodes and the optimal position, and establish the optimal matching template between the EMG signal and the lip language information, and upload it to the network terminal for storage;
- the detection subsystem is used to collect the EMG signal during the lip language action at the optimal position based on the optimal number and position of the electrodes selected by the training subsystem, call the optimal matching template, and classify and decode the EMG signal , Recognize lip language information, and transform it into corresponding voice and image information, and display it in real time to realize lip language recognition.
- the training subsystem may include a lower computer of the training subsystem and an upper computer of the training subsystem, and the lower computer of the training subsystem may include:
- the high-density array electrode is used to obtain the high-density EMG signal of the pronunciation muscles of the user's lip language by pasting on the facial and neck pronunciation muscles;
- the EMG acquisition module is used to amplify, filter, and convert the signals collected by the high-density array electrodes, and transmit them to the upper computer of the training subsystem.
- the upper computer of the training subsystem may include a user interaction module and a signal classification, correction matching feedback training module, and the user interaction module may include:
- EMG signal display sub-module used to display the collected EMG signal in real time
- the lip language training scene display sub-module is used to provide pictures and text of the lip language scene
- the channel selection and positioning chart display sub-module is used to provide the position distribution of electrodes on the face and neck.
- the signal classification, correction matching feedback training module may include:
- the signal processing sub-module is used to filter out power frequency interference and baseline drift with filters, and filter out interference noise in EMG signal by wavelet transform and template matching algorithm;
- the classification sub-module is used to extract the EMG signal related to the pronunciation of the specified short sentence, extract the feature value, establish the corresponding relationship between the EMG signal and the specified short sentence, and perform the collected lip language content based on the EMG information classification;
- the channel selection sub-module is used to select the best matching template, establish a personal training set, and transmit it to the network terminal.
- the detection subsystem may include a detection subsystem lower computer and a detection subsystem upper computer
- the detection subsystem lower computer may include:
- Patch-type flexible electrodes used to collect the EMG signal during the lip language movement at the optimal position
- the wireless EMG acquisition module is used to wirelessly transmit the EMG information collected by the patch-type flexible electrode to the upper computer of the detection subsystem.
- the upper computer of the detection subsystem may include:
- the personal training set download module is used to call the personal training set from the network sharing port of the training subsystem through the connection to the network, and store it in the APP client;
- the lip information recognition and decoding module is used to denoise and filter the signal, and to match the characteristics of the EMG signal with the personal training set.
- the lip information is decoded, the lip information is identified, and the classification result is The corresponding lip language content is converted into text information, and converted into voice and pictures for real-time transmission and display;
- the APP display interaction module is used to display the optimal data set for channel selection, real-time display of electrode position, real-time display of EMG signal, real-time display of classification results, and/or display of voice picture translation.
- the lip language information recognition and decoding module is also used to transmit the recognition result to an emergency contact set by the system.
- the high-density array electrode may include 130 single electrodes, and the single electrodes are arranged in a high-density form with a center spacing of 1 cm.
- the lower computer of the training subsystem may also include an electrode placement orifice.
- the EMG acquisition module may include a microcontroller, an analog-to-digital converter, an independent synchronous clock, a pre-signal filter amplifier and a low-noise power supply.
- the embodiment of the present invention has the beneficial effect that: the embodiment of the present invention uses the training subsystem to collect the facial and neck EMG signals during the lip language movement through the high-density array electrode, and improves the signal preprocessing algorithm. Signal quality, classify the type of lip language action through the classification algorithm, select the optimal number of electrodes and the optimal position through the channel selection algorithm, and establish the optimal matching template between the EMG signal and the lip language information, and upload it to the network terminal storage.
- the detection subsystem is used based on the optimal number and position of electrodes selected by the training subsystem to collect the EMG signal during the lip language action at the optimal position, call the optimal matching template, and compare the EMG signal Perform classification and decoding, recognize lip language information, and convert it into corresponding voice and image information, display it in real time, and realize lip language recognition.
- high-density array electrodes are used to obtain real-time and complete EMG signals during the pronunciation process. After processing and analysis, the electrodes that contribute the most to the lip language action in muscle activity are screened out, and The optimal number of electrodes and electrode positions are determined to achieve objective positioning of lip language recognition electrode selection, which greatly improves the accuracy of lip language recognition.
- FIG. 1 is a structural block diagram of an auxiliary communication system based on surface electromyography lip language recognition provided by an embodiment of the present invention.
- FIG. 1 shows a structural block diagram of an auxiliary communication system based on surface electromyography lip language recognition provided by an embodiment of the present invention. For ease of description, only the parts related to this embodiment are shown.
- an auxiliary communication system based on surface electromyography lip language recognition may include a training subsystem and a detection subsystem.
- the training subsystem is used to collect facial and neck electromyographic signals in the process of lip language movement through high-density array electrodes, improve signal quality through signal preprocessing algorithms, and classify lip language movement types through the channel
- the selection algorithm selects the optimal number of electrodes and the optimal position, and establishes the optimal matching template between the EMG signal and the lip information, and uploads it to the network terminal for storage.
- the detection subsystem is used to collect the electromyographic signal during the lip language action at the optimal position based on the optimal number and position of the electrodes selected by the training subsystem, call the optimal matching template, and perform the calculation on the electromyographic signal Classify and decode, recognize lip language information, and transform it into corresponding voice and image information, and display it in real time to realize lip language recognition.
- the training subsystem may include two parts: a lower computer and an upper computer, that is, the lower computer of the training subsystem and the upper computer of the training subsystem.
- the lower computer of the training subsystem may include a high-density array electrode and an EMG acquisition module.
- the high-density array electrode is used to obtain the high-density EMG signal of the pronunciation muscles of the user's lip language by sticking on the facial and neck pronunciation muscles.
- the reason why it is necessary to obtain the EMG signal through the high-density array electrode first is that the personal habits and pronunciation methods are different, and the parts of each person’s pronunciation force are not exactly the same. There are certain differences in the muscle activity during the pronunciation process. The characteristic positions are also different, and it is very unreasonable for different people to place electrodes on the same muscle positions. Therefore, in this embodiment, the high-density array electrodes are used to collect comprehensive EMG signals.
- the high-density array electrode can be composed of a large number of single electrodes.
- the specific number of single electrodes and the spacing between the single electrodes can be customized according to the size of the user’s face and neck, so as to ensure that comprehensive pronunciation muscles can be collected. Group EMG signal shall prevail.
- the high-density array electrode may include 130 single electrodes, and the single electrodes are arranged in a high-density form with a center-to-center spacing of 1 cm.
- the EMG acquisition module may be a 130-channel EMG acquisition module, including a microcontroller (Micro Controller Unit, MCU), analog-to-digital converter, independent synchronous clock, pre-signal filter amplifier and low-noise power supply, used to amplify, filter, analog-to-digital conversion of signals collected by high-density array electrodes, and pass USB or other
- MCU Micro Controller Unit
- analog-to-digital converter independent synchronous clock
- pre-signal filter amplifier and low-noise power supply used to amplify, filter, analog-to-digital conversion of signals collected by high-density array electrodes, and pass USB or other
- the transmission path is transmitted to the upper computer of the training subsystem.
- the lower computer of the training subsystem may also include electrode placement orifice plates, and each orifice plate is provided with corresponding electrode hole positions, wherein the hole spacing is about 1 cm to ensure that the electrode distance is small enough.
- the orifice plate is divided into 4 specifications: 20 holes, 25 holes, 40 holes, and 48 holes. 20, 25, 40, and 48 electrodes can be placed at the same time, reducing the workload and making the operation more convenient.
- the upper computer of the training subsystem may be a desktop computer, a notebook computer, a tablet computer, etc., and includes a user interaction module and a signal classification, correction matching feedback training module.
- the user interaction module may include an electromyographic signal display submodule, a lip language training scene display submodule, and a channel selection positioning chart display submodule.
- the EMG signal display sub-module is used for real-time display of the collected EMG signal, and at the same time provides a single-channel signal selection function, which can observe the signal quality of all channels in real time and ensure the reliability of the signal.
- the lip language training scene display sub-module is used to provide lip language scene pictures and texts needed in daily life to provide users with a personalized training set. Through fixed scene mode training, EMG signals are collected and stored as lip language analysis muscles. Electricity database. In addition, this sub-module also provides task prompts such as: “read again”, “next scene”, etc., to provide friendly interaction for repeated training and next steps.
- the channel selection positioning chart display sub-module is used to provide the position distribution of the electrodes on the face and neck, and through training classification, real-time display of the number and specific positions of the selected effective channels.
- the signal classification, correction matching feedback training module may include a signal processing sub-module, a classification sub-module, and a channel selection sub-module.
- the signal processing sub-module is used to use IIR bandpass filters and filters based on optimization algorithms to initially filter out power frequency interference and baseline drift, and then use algorithms such as wavelet transform and template matching algorithms to further filter out EMG signals Interference noise such as artifacts and ECG can preprocess the signal to improve signal quality and reliability.
- the classification sub-module is used to perform algorithm processing such as normalization and blind source separation on the signal to extract the EMG signal related to the pronunciation of the specified short sentence, extract the feature value, and use linear classifier, neural network and support vector Machine technology establishes the correspondence between the EMG signal and the specified short sentence, and classifies the collected lip language content based on the EMG information.
- algorithm processing such as normalization and blind source separation on the signal to extract the EMG signal related to the pronunciation of the specified short sentence, extract the feature value, and use linear classifier, neural network and support vector Machine technology establishes the correspondence between the EMG signal and the specified short sentence, and classifies the collected lip language content based on the EMG information.
- the channel selection sub-module is used to select the EMG template with the least number of channels and the best classification accuracy after multiple calibration and matching, and store and save the best matching template of EMG signal and lip language information to establish personal training Set, and transmit the optimal module data set to the network terminal.
- the detection subsystem may include two parts: a lower computer and an upper computer, that is, a lower computer of the detection subsystem and an upper computer of the detection subsystem.
- the lower computer of the detection subsystem includes a patch type flexible electrode and a wireless EMG acquisition module.
- the patch type flexible electrode is used to collect the electromyographic signal during the lip language action at the optimal position.
- the existing EMG electrode hard plate electrode has a limited degree of adhesion to the skin, and the pulling deformation of the skin is likely to cause greater noise interference to the EMG data, and the patch-type flexible electrode is made of several flexible materials.
- the FPC soft-board single-electrode form a bendable and custom-made flexible electrode sheet that is tightly integrated with the skin.
- the specific number of single-electrodes can be set according to the actual situation. Preferably, it can be set to 8.
- the user selects the number of flexible electrodes to be used and the placement position of the electrodes on the face and neck according to the calculation results of the training subsystem.
- the degree of personalization is high. It fits closely to the skin and follows the micro-deformation of the skin. The obtained electromyographic information is more stable and reliable.
- the wireless EMG acquisition module integrates 8-channel EMG acquisition and wireless transmission functions, in which a microcontroller with integrated WIFI function, pre-amplification circuit, analog-to-digital conversion circuit, etc. are used to collect patch-type flexible electrodes
- the EMG information is wirelessly transmitted to the upper computer of the detection subsystem through WIFI.
- Wireless transmission is more convenient than traditional wired electrodes, is simple to wear, and reduces the influence of entanglement between wired electrode wires. WIFI transmission does not lose data, ensuring data integrity.
- Multi-channel EMG information is transmitted wirelessly at the same time, which makes up for the defect of insufficient information in the traditional method of electrode channels.
- the upper computer of the detection subsystem may be a mobile phone, a tablet computer, etc., including a personal training set download module, a lip language information recognition and decoding module, and an APP display interaction module.
- the personal training set downloading module is used to call the personal training set from the network shared port of the training subsystem by connecting to the network, and store it in the APP client.
- the lip language information recognition and decoding module includes functional modules such as data preprocessing, online EMG classification, and voice conversion of the classification results, which are used to denoise and filter the signal by using IIR filters, wavelet transform, etc. Match the features with the personal training set, decode the lip language information by using the classification algorithm, recognize the lip language content, convert the lip language content corresponding to the classification result into text information, and call the voice and picture templates through processing to convert it into voice and The picture is transmitted and displayed in real time, and is also used to transmit the recognition result to the emergency contact set by the system through the APP.
- functional modules such as data preprocessing, online EMG classification, and voice conversion of the classification results, which are used to denoise and filter the signal by using IIR filters, wavelet transform, etc. Match the features with the personal training set, decode the lip language information by using the classification algorithm, recognize the lip language content, convert the lip language content corresponding to the classification result into text information, and call the voice and picture templates through processing to convert it into voice and The picture is transmitted
- the APP display interaction module is used to display the optimal data set for channel selection, real-time display of electrode positions, real-time display of electromyographic signals, real-time display of classification results, and/or display of voice picture translation.
- the above content is collected and analyzed for the electromyographic information of the facial and neck pronunciation muscles.
- other muscles related to the pronunciation function such as the abdomen, also contain certain pronunciation movement information, which can also be used as this implementation
- the source of the EMG information of the case, and the pronunciation information recognition is collected and analyzed for the electromyographic information of the facial and neck pronunciation muscles.
- the core content of this embodiment is lip language recognition based on high-density EMG.
- Lip language recognition can not only be used for people with speech impairments, but also can be extended to other occasions with inconvenient pronunciation or strong noise, such as underwater operations, noisy factories, etc. , Has huge room for development.
- the embodiment of the present invention uses the training subsystem to collect facial and neck electromyographic signals in the process of lip language movements through high-density array electrodes, improves signal quality through signal preprocessing algorithms, and uses classification algorithms to determine the types of lip language movements. For classification, the optimal number of electrodes and optimal positions are selected through the channel selection algorithm, and the optimal matching template between the EMG signal and the lip information is established, and uploaded to the network terminal for storage.
- the detection subsystem is used based on the optimal number and position of electrodes selected by the training subsystem to collect the EMG signal during the lip language action at the optimal position, call the optimal matching template, and compare the EMG signal Perform classification and decoding, recognize lip language information, and convert it into corresponding voice and image information, display it in real time, and realize lip language recognition.
- high-density array electrodes are used to obtain real-time and complete EMG signals during the pronunciation process. After processing and analysis, the electrodes that contribute the most to the lip language action in muscle activity are screened out, and The optimal number of electrodes and electrode positions are determined to achieve objective positioning of lip language recognition electrode selection, which greatly improves the accuracy of lip language recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Heart & Thoracic Surgery (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Psychiatry (AREA)
- Pathology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physiology (AREA)
- Neurology (AREA)
- Dermatology (AREA)
- Neurosurgery (AREA)
- Evolutionary Computation (AREA)
- Social Psychology (AREA)
- Power Engineering (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (10)
- 一种基于表面肌电唇语识别的辅助沟通系统,其特征在于,包括:训练子系统,用于通过高密度阵列式电极采集唇语动作过程中的面颈部肌电信号,通过信号预处理算法提高信号质量,通过分类算法对唇语动作类型进行分类,通过通道选择算法选取最优电极个数和最优位置,并建立肌电信号与唇语信息之间的最优匹配模板,上传至网络终端存储;检测子系统,用于基于所述训练子系统选取的最优电极个数和位置,采集最优位置处唇语动作过程中的肌电信号,调用最优匹配模板,对肌电信号进行分类解码,识别唇语信息,并转化成对应的语音和图像信息,实时显示出来,实现唇语识别。
- 根据权利要求1所述的系统,其特征在于,所述训练子系统包括训练子系统下位机和训练子系统上位机,所述训练子系统下位机包括:高密度阵列式电极,用于通过粘贴在面颈部发音肌群上来获取使用者唇语过程中发音肌群的高密度肌电信号;肌电采集模块,用于对高密度阵列电极采集到的信号进行放大、滤波、模数转换,并传输到训练子系统上位机。
- 根据权利要求2所述的系统,其特征在于,所述训练子系统上位机包括用户交互模块和信号分类、校正匹配反馈训练模块,所述用户交互模块包括:肌电信号显示子模块,用于实时显示采集的肌电信号;唇语训练场景显示子模块,用于提供唇语场景图片和文字;通道选择定位图表显示子模块,用于提供电极在面部和颈部的位置分布情况。
- 根据权利要求3所述的系统,其特征在于,所述信号分类、校正匹配反馈训练模块包括:信号处理子模块,用于采用滤波器滤除工频干扰和基线漂移,利用小波变换、模板匹配算法滤除肌电信号中的干扰噪声;分类子模块,用于提取与指定短句的发音相关的肌电信号,提取特征值,建立肌电信号与所述指定短句之间的对应关系,基于肌电信息对采集的唇语内容进行分类;通道选取子模块,用于选取最优匹配模板,建立个人训练集,并传输到网络终端。
- 根据权利要求1所述的系统,其特征在于,所述检测子系统包括检测子系统下位机和检测子系统上位机,所述检测子系统下位机包括:贴片式柔性电极,用于采集最优位置处唇语动作过程中的肌电信号;无线肌电采集模块,用于将贴片式柔性电极采集的肌电信息,通过无线传输到检测子系统上位机。
- 根据权利要求5所述的系统,其特征在于,所述检测子系统上位机包括:个人训练集下载模块,用于通过连接网络,从训练子系统网络共享端口,调用个人训练集,并存储于APP客户端;唇语信息识别解码模块,用于对信号进行降噪滤波处理,并对肌电信号与个人训练集进行特征匹配,通过采用分类算法,解码唇语信息,识别出唇语内容,将分类结果所对应的唇语内容转换成文字信息,并转换成语音和图片进行实时传输显示;APP显示交互模块,用于进行通道选取最优数据集显示、电极位置实时显示、肌电信号实时显示、分类结果实时显示和/或语音图片翻译显示。
- 根据权利要求6所述的系统,其特征在于,所述唇语信息识别解码模块还用于将识别结果传送给系统设置的紧急联系人。
- 根据权利要求1所述的系统,其特征在于,所述高密度阵列式电极包括130个单电极,且各个单电极之间以中心间距1厘米的高密度形式排列。
- 根据权利要求2所述的系统,其特征在于,所述训练子系统下位机还包括电极放置孔板。
- 根据权利要求2所述的系统,其特征在于,所述肌电采集模块包括微控制器、模数转换器、独立同步时钟、前置信号滤波放大器和低噪声电源。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/960,496 US20210217419A1 (en) | 2019-03-25 | 2019-12-31 | Lip-language recognition aac system based on surface electromyography |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910228442.4A CN110059575A (zh) | 2019-03-25 | 2019-03-25 | 一种基于表面肌电唇语识别的辅助沟通系统 |
CN201910228442.4 | 2019-03-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020192231A1 true WO2020192231A1 (zh) | 2020-10-01 |
Family
ID=67317373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130814 WO2020192231A1 (zh) | 2019-03-25 | 2019-12-31 | 一种基于表面肌电唇语识别的辅助沟通系统 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20210217419A1 (zh) |
CN (1) | CN110059575A (zh) |
WO (1) | WO2020192231A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112330713A (zh) * | 2020-11-26 | 2021-02-05 | 南京工程学院 | 基于唇语识别的重度听障患者言语理解度的改进方法 |
CN113887339A (zh) * | 2021-09-15 | 2022-01-04 | 天津大学 | 融合表面肌电信号与唇部图像的无声语音识别系统及方法 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10943100B2 (en) | 2017-01-19 | 2021-03-09 | Mindmaze Holding Sa | Systems, methods, devices and apparatuses for detecting facial expression |
WO2018146558A2 (en) | 2017-02-07 | 2018-08-16 | Mindmaze Holding Sa | Systems, methods and apparatuses for stereo vision and tracking |
CN110059575A (zh) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | 一种基于表面肌电唇语识别的辅助沟通系统 |
CN110865705B (zh) * | 2019-10-24 | 2023-09-19 | 中国人民解放军军事科学院国防科技创新研究院 | 多模态融合的通讯方法、装置、头戴设备及存储介质 |
CN111190484B (zh) * | 2019-12-25 | 2023-07-21 | 中国人民解放军军事科学院国防科技创新研究院 | 一种多模态交互系统和方法 |
CN111419230A (zh) * | 2020-04-17 | 2020-07-17 | 上海交通大学 | 一种用于运动单元解码的表面肌电信号采集系统 |
CN111832412B (zh) * | 2020-06-09 | 2024-04-09 | 北方工业大学 | 一种发声训练矫正方法及系统 |
CN112349182A (zh) * | 2020-11-10 | 2021-02-09 | 中国人民解放军海军航空大学 | 一种聋哑人交谈辅助系统 |
CN112741619A (zh) * | 2020-12-23 | 2021-05-04 | 清华大学 | 一种自驱动唇语动作捕捉装置 |
CN112927704A (zh) * | 2021-01-20 | 2021-06-08 | 中国人民解放军海军航空大学 | 一种沉默式全天候单兵通信系统 |
CN113627401A (zh) * | 2021-10-12 | 2021-11-09 | 四川大学 | 融合双注意力机制的特征金字塔网络的肌电手势识别方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060129400A1 (en) * | 2004-12-10 | 2006-06-15 | Microsoft Corporation | Method and system for converting text to lip-synchronized speech in real time |
WO2018113649A1 (zh) * | 2016-12-21 | 2018-06-28 | 深圳市掌网科技股份有限公司 | 一种虚拟现实语言交互系统与方法 |
CN108319912A (zh) * | 2018-01-30 | 2018-07-24 | 歌尔科技有限公司 | 一种唇语识别方法、装置、系统和智能眼镜 |
CN110059575A (zh) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | 一种基于表面肌电唇语识别的辅助沟通系统 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102169690A (zh) * | 2011-04-08 | 2011-08-31 | 哈尔滨理工大学 | 基于表面肌电信号的语音信号识别系统和识别方法 |
CN102999154B (zh) * | 2011-09-09 | 2015-07-08 | 中国科学院声学研究所 | 一种基于肌电信号的辅助发声方法及装置 |
CN203252647U (zh) * | 2012-09-29 | 2013-10-30 | 艾利佛公司 | 用于判定生理特征的可佩带的设备 |
EP3027007A2 (en) * | 2013-05-20 | 2016-06-08 | AliphCom | Combination speaker and light source responsive to state(s) of an organism based on sensor data |
KR20150104345A (ko) * | 2014-03-05 | 2015-09-15 | 삼성전자주식회사 | 음성 합성 장치 및 음성 합성 방법 |
CN103948388B (zh) * | 2014-04-23 | 2018-10-30 | 深圳先进技术研究院 | 一种肌电采集装置 |
US9789306B2 (en) * | 2014-12-03 | 2017-10-17 | Neurohabilitation Corporation | Systems and methods for providing non-invasive neurorehabilitation of a patient |
-
2019
- 2019-03-25 CN CN201910228442.4A patent/CN110059575A/zh active Pending
- 2019-12-31 WO PCT/CN2019/130814 patent/WO2020192231A1/zh active Application Filing
- 2019-12-31 US US16/960,496 patent/US20210217419A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060129400A1 (en) * | 2004-12-10 | 2006-06-15 | Microsoft Corporation | Method and system for converting text to lip-synchronized speech in real time |
WO2018113649A1 (zh) * | 2016-12-21 | 2018-06-28 | 深圳市掌网科技股份有限公司 | 一种虚拟现实语言交互系统与方法 |
CN108319912A (zh) * | 2018-01-30 | 2018-07-24 | 歌尔科技有限公司 | 一种唇语识别方法、装置、系统和智能眼镜 |
CN110059575A (zh) * | 2019-03-25 | 2019-07-26 | 中国科学院深圳先进技术研究院 | 一种基于表面肌电唇语识别的辅助沟通系统 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112330713A (zh) * | 2020-11-26 | 2021-02-05 | 南京工程学院 | 基于唇语识别的重度听障患者言语理解度的改进方法 |
CN112330713B (zh) * | 2020-11-26 | 2023-12-19 | 南京工程学院 | 基于唇语识别的重度听障患者言语理解度的改进方法 |
CN113887339A (zh) * | 2021-09-15 | 2022-01-04 | 天津大学 | 融合表面肌电信号与唇部图像的无声语音识别系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110059575A (zh) | 2019-07-26 |
US20210217419A1 (en) | 2021-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020192231A1 (zh) | 一种基于表面肌电唇语识别的辅助沟通系统 | |
Panicker et al. | A survey of machine learning techniques in physiology based mental stress detection systems | |
Gohel et al. | Review on electromyography signal acquisition and processing | |
Khezri et al. | Reliable emotion recognition system based on dynamic adaptive fusion of forehead biopotentials and physiological signals | |
WO2017193497A1 (zh) | 基于融合模型的智能化健康管理服务器、系统及其控制方法 | |
Chen et al. | Eyebrow emotional expression recognition using surface EMG signals | |
US7963931B2 (en) | Methods and devices of multi-functional operating system for care-taking machine | |
CN109065162A (zh) | 一种综合性智能化诊断系统 | |
Rattanyu et al. | Emotion monitoring from physiological signals for service robots in the living space | |
US20220208194A1 (en) | Devices, systems, and methods for personal speech recognition and replacement | |
CN109124655A (zh) | 精神状态分析方法、装置、设备、计算机介质及多功能椅 | |
CN108814565A (zh) | 一种基于多传感器信息融合和深度学习的智能中医健康检测梳妆台 | |
CN111513735A (zh) | 基于脑机接口和深度学习的重度抑郁症辨识系统及应用 | |
Zhuang et al. | Real-time emotion recognition system with multiple physiological signals | |
CN105943022B (zh) | 一种具有三导联重构十二导联功能的心电监测系统 | |
Smith et al. | Detection of simulated vocal dysfunctions using complex sEMG patterns | |
Ntalampiras | Model ensemble for predicting heart and respiration rate from speech | |
CN115281651A (zh) | 一种无感式一体化睡眠呼吸疾病诊断系统 | |
CN112037916A (zh) | 共享多功能防猝死生理信息检测系统及其方法 | |
CN112669963A (zh) | 智能健康机、健康数据生成方法以及健康数据管理系统 | |
CN215017589U (zh) | 一种基于微表情技术的养老服务用心理评估系统及设备 | |
Tan et al. | Extracting spatial muscle activation patterns in facial and neck muscles for silent speech recognition using high-density sEMG | |
Rattanyu et al. | Emotion recognition using biological signal in intelligent space | |
Yi et al. | Mordo: Silent command recognition through lightweight around-ear biosensors | |
CN107085468A (zh) | 一种实时检测并显示人类情绪状态的智能笔及其检测方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19921660 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921660 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921660 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19921660 Country of ref document: EP Kind code of ref document: A1 |