CN110767208A

CN110767208A - Auxiliary rescue communication method and device based on unvoiced instruction recognition of facial surface muscle signals

Info

Publication number: CN110767208A
Application number: CN201911128112.4A
Authority: CN
Inventors: 杨梦�
Original assignee: China University of Mining and Technology Beijing CUMTB
Current assignee: China University of Mining and Technology Beijing CUMTB
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-02-07

Abstract

The invention discloses an auxiliary rescue communication method and device based on the non-sounding instruction identification of facial surface muscle signals, which comprises the steps of firstly, collecting muscle electric signals at corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance; the data processing device carries out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals, and identifies corresponding instruction words; then the language processing device converts the recognized instruction words into artificial voice; and then the artificial voice is sent to an earphone of an operator through a radio frequency device, and the voice text is sent to a command center in a wireless mode. The method and the device avoid the inherent problem that the traditional voice recognition result is interfered by the environmental background noise, and are suitable for the application scene with high noise background or incapable of receiving sound.

Description

Auxiliary rescue communication method and device based on unvoiced instruction recognition of facial surface muscle signals

Technical Field

The invention relates to the technical field of rescue communication, in particular to an auxiliary rescue communication method and device based on the unvoiced instruction recognition of facial surface muscle signals.

Background

Speech recognition under strong background noise environment, such as disasters, wars, etc., is always one of the important problems in the speech recognition field, and is also one of the inherent problems that have not been solved perfectly. If communication is needed in an environment with strong background noise, for example, firefighters or divers need to give instructions according to the current state, communicate among teammates, and the like, the performance of voice recognition is more important.

In the safe production and rescue work of coal mines, rescue teams or reconnaissance teams need to communicate effectively under severe and extreme environments. However, the live engine sound and the working sound can weaken the sound of communication between team members, even if the existing digital voice communication technology is adopted and the microphone is carried for receiving sound, the obtained voice still cannot completely eliminate the interference of background sound, such as intermittent hissing sound emitted by breathing equipment carried by the team members, and the breathing device covering the face can distort the emitted sound, so that the speaking is unclear. In order to solve the problem of speech recognition in such extreme environments, many researches have been made in the prior art, such as noise reduction, bone conduction, etc., but the prior arts are still not suitable for complex and variable real-world environments.

Disclosure of Invention

The invention aims to provide an auxiliary rescue communication method and device based on the unvoiced instruction recognition of facial surface muscle signals, which avoid the inherent problem that the traditional voice recognition result is interfered by environmental background noise and are suitable for application scenes with high noise backgrounds or incapable of receiving sound.

The purpose of the invention is realized by the following technical scheme:

an assisted rescue communication method based on unvoiced instruction recognition of facial surface muscle signals, the method comprising:

step 1, firstly, acquiring muscle electric signals of corresponding positions in real time through a measuring electrode which is attached to the facial skin of a user in advance;

step 2, carrying out preprocessing, feature extraction and classification identification operations on the collected muscle electric signals by a data processing device to identify corresponding instruction words;

step 3, the recognized instruction words are converted into artificial voices by the language processing device;

and 4, sending the artificial voice to an earphone of an operator through a radio frequency device, and sending the voice text to a command center in a wireless mode.

The invention also provides an auxiliary rescue communication device based on the non-sounding instruction identification of the facial surface muscle signal, which comprises a collecting device, a data processing device, a language processing device and a radio frequency device, wherein:

the acquisition device consists of five channels of measuring electrodes, and the measuring electrodes are attached to the skin of the face of a user and acquire muscle electric signals at corresponding positions in real time;

the acquisition device is in wired connection with the data processing device, and the data processing device receives the muscle electric signals transmitted by the acquisition device, performs preprocessing, feature extraction and classification identification operations on the muscle electric signals and identifies corresponding instruction words;

the language processing device is electrically connected with the data processing device and is used for converting the recognized instruction words into artificial voice;

the language processing device is in wired connection with the radio frequency device, the radio frequency device is used for receiving and sending an identification result, sending the artificial voice to an earphone of an operator through the radio frequency device, and sending a voice text to a command center in a wireless mode.

According to the technical scheme provided by the invention, the method and the device avoid the inherent problem that the traditional voice recognition result is interfered by the environmental background noise, and are suitable for the application scene with high noise background or incapable of receiving sound.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals according to an embodiment of the present invention;

FIG. 2 is a schematic view of a skin-contacting position of a measuring electrode according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the present invention will be further described in detail with reference to the accompanying drawings, and as shown in fig. 1, a schematic flow chart of an auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals provided by the embodiment of the present invention is shown, where the method includes:

in this step, as shown in fig. 2, a schematic view of a skin-contacting position of the measuring electrode according to an embodiment of the present invention is shown, where the measuring electrode includes five passages, and the measuring electrodes of the five passages are respectively skin-contacted on the set positions of the facial skin of the user.

in this step, the recognized command words need to be preset according to the use environment and requirements, the command words can be single words or two-word phrases, such as forward, backward, fire extinguishing, danger, aggregation, etc., before the device is used, the signal is collected and processed to generate a command sample set, and then the command sample set is stored in the data processing device for realizing the classification recognition operation.

In a specific implementation, the specific processing procedure of the data processing apparatus is as follows:

1) firstly, filtering and denoising collected muscle electric signals;

because the collected signal is interfered by other physiological electric signals or electronic equipment, the signal needs to be preprocessed, and the embodiment of the invention adopts a high-channel filtering of 20HZ and a comb notch filter (notch filter) of 50HZ to eliminate noise.

2) Then, judging the muscle activity state based on the preprocessed muscle electric signals, and segmenting effective signals generated during muscle activity, namely the starting point and the ending point of the muscle activity state, wherein the specific process is as follows:

firstly, carrying out sequential windowing on the muscle electric signals after pretreatment, wherein the window length is 200ms, and the standard deviation of a calculation window is compared with a threshold value, and the calculation formula of the threshold value is as follows:

Th＝mean(rest)+μ*std(rest)

wherein Th is a threshold value; rest is the signal within the first 100ms of a window signal; mean is the signal expectation; std is the standard deviation of the signal; mu is a sensitivity value, and the result is optimal when the mu value is 3 through test tests;

corresponding signal data are obtained after detecting that muscles are in an active state, and the lengths of extracted signals are different due to different speaking speeds of users, wherein the length of the signals is between 200ms and 400ms in general;

the signal is further extended to 400ms by cubic interpolation as a valid signal.

3) And then carrying out feature extraction and optimization on the effective signals in the muscle activity state, wherein the specific process is as follows:

the characteristic adopted by the characteristic extraction is a time characteristic, firstly effective signals are framed, the frame length is 30ms, the frame shift is 15 ms:

for each moving window, extracting four characteristic values, which are respectively:

wherein, N represents the number of numerical values contained in a moving window; xi is the ith numerical value of the current window; the characteristic dimensionality obtained after windowing characteristic extraction is carried out on a 5-channel effective signal is 520 dimensions;

here, considering that the activity of one muscle cluster affects the coarse activity of surrounding muscle clusters, the correlation exists in nature in the characteristics, the algorithm efficiency can be improved from a small amount of effective characteristics, then the characteristic dimension is optimized from 520 dimensions to 50 dimensions by using a linear discriminant analysis method (LDA), and information of high latitude is ensured by using a small amount of characteristic dimension, so that a classification result with similar or even same effect is obtained.

4) Then, classifying and identifying the effective signals extracted by the features by adopting a consistency prediction algorithm based on random forests, and identifying corresponding instruction words, wherein the specific process comprises the following steps:

first, a random forest based definition A_nA closeness value (RandomForest proximity) P (i, j) between two samples is calculated in a random forest, i, j being 1Similarity between the samples can be defined by using the closeness value, a singular measurement function A of a consistency prediction algorithm is defined and used for calculating a singular value (Strengenses Score) α of a sample, and an ith sample z_iThe singular values of (a) can be calculated as follows:

α_i＝A_n({z₁，...，z_i-1，z_i+1，...，z_n}，z_i)

wherein the data in { } is unordered, z_i＝(x_i，y_i)，x_iFeatures representing an ith sample in the sample set; y is_iA label representing the specimen;

the singular metric function a based on random forests is defined as follows:

for sample (x)_i，y_i) Comprises the following steps:

A(x_i，y_i)＝A(x_i，y_i)^-/A(x_i，y_i)⁺

wherein the content of the first and second substances,

t_sfinger sample (x)_i，y_i) (ii) an s-th greater affinity value than samples in the subsequence of samples having the same tag as the sequence; j is a function of_sFinger sample (x)_i，y_i) (ii) an s-th greater affinity value than samples in the subsequence of samples with different tags to which it is tagged;

then, classifying and identifying the effective signals extracted by the features based on a consistency prediction algorithm of the random forest, wherein the specific process is as follows:

for a valid signal x_nFor which a hypothetical instruction tag y is set_nY, with a pre-collected and processed instruction sample set Z_t＝{(x₁，y₁)，...，(x_n-1，y_n-1) Generating a new sample sequence Z ═ Z_t，(x_n，y)}；

Applying A based on random forest definition to new sample sequence Z_n，A_nFeeding sampleEach sample in the sequence is assigned a singular value to form a singular value sequence

By combining singular values in a sequenceWith others

Comparing to obtain effective signal x_nCurrent hypothesis of confidence level of instruction tag y

P is above^yIs shown when x_nWhen the label of (a) is y, how consistent the sample is with the rest of the sample in the sample sequence, p^yThe greater the degree of consistency, the better;

then is the valid signal x_nResetting a new hypothetical instruction tag, repeating the above steps until all instructions are treated as hypothetical instruction tags, where p^yThe hypothetical instruction tag y with the largest value is considered as the recognition result.

The above method is to perform analysis by acquiring surface electromyogram (sEMG) through electrodes attached to facial muscles, which may also be called merry recognition. The surface electromyogram is to convert the weak potential difference generated by muscle fiber contraction into digital signal to reflect the state of the neuromuscular, and the potential difference is changed due to the change of muscle structure and function, and the electromyogram signal generated by different muscle fiber contraction is also changed correspondingly. The speech sound is generated by the complex cooperation of a series of facial and other parts of muscle clusters, and the corresponding muscle clusters used for different pronunciations or characters are different, so that the identification of instruction words can be realized through the skin electromyographic signals, and the generation of the muscle electrical signals is related to the contraction of the muscle clusters and is not related to the real pronunciations, so that the instruction words can be in the form of small sound or uninteresting reading and non-sounding, thereby solving the problem of distorted sound caused by background noise or breathing equipment.

Based on the above method, an embodiment of the present invention further provides an auxiliary rescue communication device based on the non-vocal instruction recognition of the facial surface muscle signal, as shown in fig. 3, which is a schematic structural diagram of the device according to the embodiment of the present invention, and the device mainly includes an acquisition device, a data processing device, a language processing device, and a radio frequency device, wherein:

The specific implementation of each component in the above-described apparatus is described in the above-described method embodiment.

In a specific implementation, the device can be integrated into a neck hanging type independent device; or the device is combined with a breathing mask, and specifically comprises:

the acquisition device is in wired connection with the breathing mask, and the breathing mask is in wired connection with the battery; the data processing device, the language processing device and the radio frequency device are arranged on the bearing equipment.

The working process of the auxiliary rescue communication device is as follows:

(1) the device is turned on, and connection (Bluetooth, wireless) is established;

(2) the acquisition device acquires real-time electromyographic signals, and the electromyographic signals are transmitted to the data processing device through a cable;

(3) the data processing device processes the electromyographic signals in real time and enters a monitoring state during initialization, if the awakening words are monitored to enter a recognition state, the radio frequency device sends a prompt tone to the earphone of a user to indicate that the instruction can be described, the prompt tone is also sent to the earphones of other teammates through the radio frequency device, and the prompt is that the instruction is issued;

(4) the user describes the instruction words, the description mode can be voiced, the worship mode can also be silent, and the acquisition device acquires corresponding myoelectric signals and transmits the corresponding myoelectric signals to the data processing device;

(5) the data processing device in the identification state carries out identification operation on the electromyographic signals and identifies corresponding instruction words;

(6) and the language processing device converts the recognized instruction words into artificial voice, the artificial voice is sent to earphones of other teammates through the radio frequency device, and the voice text is sent to the command center in a wireless mode.

It is noted that those skilled in the art will recognize that embodiments of the present invention are not described in detail herein.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An auxiliary rescue communication method based on unvoiced instruction recognition of facial surface muscle signals, the method comprising:

2. The method according to claim 1, wherein in step 1, the measuring electrodes comprise five paths, and the measuring electrodes of the five paths are respectively attached to the skin of the face of the user at the set positions.

3. The method according to claim 1, wherein in step 2, the identified instruction words are preset according to the use environment and the requirement, and the instruction words are single words or two-word phrases, specifically including forward, backward, fire-extinguishing, dangerous and collective phrases.

4. The method according to claim 1, wherein in step 2, the specific processing procedure of the data processing device is:

firstly, filtering and denoising collected muscle electric signals;

then, judging the muscle activity state based on the muscle electric signals after the pretreatment, and segmenting effective signals generated during the muscle activity, namely the starting point and the ending point of the muscle activity state;

then, extracting and optimizing the characteristics of the effective signals in the muscle activity state;

and then classifying and identifying the effective signals extracted by the features by adopting a consistency prediction algorithm based on a random forest, and identifying corresponding instruction words.

5. The method according to claim 4, wherein the muscle activity state is determined based on the preprocessed muscle electrical signals, and the process of segmenting the effective signals generated during the muscle activity is specifically as follows:

Th＝mean(rest)+μ*std(rest)

obtaining corresponding effective signal data after detecting that the muscle is in an active state, wherein the length of the effective signal is between 200ms and 400 ms;

and further expanding the effective signal to 400ms by using a cubic interpolation method for outputting.

6. The method according to claim 4, wherein the process of feature extraction and optimization of the valid signals within the muscle activity state is specifically as follows:

the characteristic adopted by the characteristic extraction is a time characteristic, firstly, effective signals are framed, the frame length is 30ms, and the frame shift is 15 ms;

then, a linear discriminant analysis method is used for optimizing the characteristic dimension from 520 dimensions to 50 dimensions, and a small amount of characteristic dimension is used for guaranteeing high-latitude information.

7. The method as claimed in claim 4, wherein the step of classifying and identifying the effective signals extracted by the features by using a consistency prediction algorithm based on a random forest is specifically as follows:

first, a random forest based definition A_nCalculating a closeness value (RandomForest proximity) P (i, j) between two samples in a random forest, wherein the value i, j is 1, the.., n, and the closeness value represents the similarity between the two samples without considering a real label, and using the closeness value, a singular measurement function A of a consistency prediction algorithm can be defined, and the singular measurement function A is used for calculating a singular value (Strangenness Score) α of the sample, and the ith sample z is used for calculating a singular value (Strangenness Score) α of the sample_iThe singular values of (a) can be calculated as follows:

α_i＝A_n({z₁，...，z_i-1，z_i+1，...，z_n}，z_i)

the singular metric function a based on random forests is defined as follows:

for sample (x)_i，y_i) Comprises the following steps:

A(x_i，y_i)＝A(x_i，y_i)^-/A(x_i，y_i)⁺

wherein the content of the first and second substances,

Applying A based on random forest definition to new sample sequence Z_n，A_nAllocating a singular value to each sample in the sample sequence to form a singular value sequence

By combining singular values in a sequence

With others

P is above_yIs shown when x_nWhen the label of (a) is y, how consistent the sample is with the rest of the sample in the sample sequence, p^yThe greater the degree of consistency, the better;

then is the valid signal x_nResetSetting a new hypothetical instruction tag, and repeating the above steps until all instructions are assumed as hypothetical instruction tags, where p^yThe hypothetical instruction tag y with the largest value is considered as the recognition result.

8. An auxiliary rescue communication device based on unvoiced instruction recognition of facial surface muscle signals, the device comprising a collection device, a data processing device, a language processing device and a radio frequency device, wherein:

9. The device of claim 8, wherein the device is integrally configured as a hanging neck stand-alone device;

or the device is combined with a breathing mask, and specifically comprises: