CN110865705A - Multi-mode converged communication method and device, head-mounted equipment and storage medium - Google Patents

Multi-mode converged communication method and device, head-mounted equipment and storage medium Download PDF

Info

Publication number
CN110865705A
CN110865705A CN201911019740.9A CN201911019740A CN110865705A CN 110865705 A CN110865705 A CN 110865705A CN 201911019740 A CN201911019740 A CN 201911019740A CN 110865705 A CN110865705 A CN 110865705A
Authority
CN
China
Prior art keywords
information
user
instruction
voice
lip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911019740.9A
Other languages
Chinese (zh)
Other versions
CN110865705B (en
Inventor
印二威
鲁金朋
马权智
谢良
邓宝松
闫野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center, National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
Priority to CN201911019740.9A priority Critical patent/CN110865705B/en
Publication of CN110865705A publication Critical patent/CN110865705A/en
Application granted granted Critical
Publication of CN110865705B publication Critical patent/CN110865705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/015Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/011Emotion or mood input determined on the basis of sensed human body parameters such as pulse, heart rate or beat, temperature of skin, facial expressions, iris, voice pitch, brain activity patterns

Abstract

The application provides a multi-mode converged communication method and device, a head-mounted device and a storage medium. The method comprises the following steps: acquiring voice information, lip image information and facial myoelectric information of a user; determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face electromyogram information; and identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information. In the embodiment of the application, the lip images and the facial myoelectric information are linked to perform signal processing, so that the environmental adaptability and the instruction identification accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of signal collection every time is reduced, and the accuracy of model prediction is improved.

Description

Multi-mode converged communication method and device, head-mounted equipment and storage medium
Technical Field
The application belongs to the technical field of data processing and communication, and particularly relates to a multi-mode converged communication method and device, a head-mounted device and a storage medium.
Background
The cooperation among the teams cannot be separated from information interaction, and voice communication is the most direct and accurate communication mode. However, some complex environments have certain limitations on human-to-human voice communication, and if the pilots talk with each other during driving, voice information is greatly interfered due to the booming of the engine. For another example, in special combat, whisper is required to transmit information at low voice, and at this time, the loudness of voice is too low to ensure correct and effective transmission of information.
At present, for a complex environment affecting voice communication, in the related art, the user wants to say the words are recognized through an electronic throat, which is a sensor capable of collecting laryngeal vibrations. In use, the electronic throat needs to fit closely to the user's throat. When the user speaks, the electronic larynx collects the vibration signal of the larynx and converts the vibration signal into an audio signal, and then the user speaks words.
However, the electronic throat has strict requirements on the use mode of the user, discomfort is brought to the user due to the wearing mode, and the throat vibration formed by the swallowing action of the user cannot be identified, so that a large error exists.
Disclosure of Invention
The application provides a multi-mode integrated communication method, a multi-mode integrated communication device, a head-mounted device and a storage medium, lip images and facial myoelectric information are subjected to linkage signal processing, and therefore the environmental adaptability and the instruction identification accuracy of an interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate.
An embodiment of a first aspect of the present application provides a multi-mode converged communication method, including:
acquiring voice information, lip image information and facial myoelectric information of a user;
determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face electromyogram information;
and identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information.
In some embodiments of the present application, the acquiring voice information, lip image information, and facial myoelectric information of a user includes:
collecting voice information of a user through a voice collecting device included in the head-mounted equipment;
shooting the lip region of a user through a miniature camera arranged on the voice acquisition device to obtain the lip image information of the user;
and acquiring the facial electromyographic information of the user through an electromyographic signal acquisition device which is attached to the face of the user on the head-mounted device.
In some embodiments of the present application, the identifying instruction information of the user according to the model parameter, the voice information, the lip image information, and the facial myoelectric information includes:
determining a corresponding image processing model and a corresponding myoelectricity processing model according to the model parameters;
according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model;
according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through the electromyography processing model;
and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
In some embodiments of the present application, before determining, according to the voice information, the lip image information, and the facial myoelectric information, a model parameter corresponding to a current environment through a pre-trained environment evaluation model, the method further includes:
acquiring voice information, lip image information and facial myoelectric information under different environments;
dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the face electromyogram information;
and carrying out model training according to the different data sets to obtain an environment evaluation model.
In some embodiments of the present application, after the identifying the instruction information of the user, the method further includes:
displaying the instruction information through a display device;
and receiving the confirmation information of the user and sending the instruction information to a receiving party.
In a second aspect of the present application, an embodiment of a multi-mode converged communication device is provided, including:
the information acquisition module is used for acquiring voice information, lip image information and facial myoelectric information of a user;
the environment determination module is used for determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face electromyogram information;
and the instruction identification module is used for identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information.
In some embodiments of the present application, the information obtaining module includes:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip region of a user through a miniature camera arranged on the voice acquisition device to obtain the lip image information of the user;
the myoelectricity collecting unit is used for collecting facial myoelectricity information of the user through the myoelectricity signal collecting device attached to the face of the user on the head-mounted device.
In some embodiments of the present application, the instruction recognition module is configured to determine a corresponding image processing model and a corresponding myoelectric processing model according to the model parameter; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through the electromyography processing model; and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
An embodiment of a third aspect of the present application provides a head-mounted device, including: the system comprises a voice acquisition device, an electromyographic signal acquisition device, a micro camera and a memory which are arranged on the voice acquisition device, a processor and an executable program stored on the memory, wherein the executable program is executed by the processor to realize the method of the first aspect.
A fourth aspect of the present application is directed to a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of the first aspect.
The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:
in the embodiment of the application, the acquired signal flow is subjected to signal processing to obtain an instruction prediction result, the prediction result can be displayed through an external display device, and after the speaker determines that the speaker does not have an error, the instruction information is sent through the determination button of the information transceiving unit. The lip images and the facial myoelectric information are linked with each other and processed by signals, so that the environmental adaptability and the instruction identification accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of signal collection every time is reduced, and the accuracy of model prediction is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings.
In the drawings:
fig. 1 is a flowchart illustrating a multi-modal converged communication method according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating an electromyographic signal acquisition apparatus provided in an embodiment of the present application;
FIG. 3 shows a schematic view of a headset-style headset provided by an embodiment of the present application;
FIG. 4 is a flow chart illustrating image and electromyographic signal processing provided by an embodiment of the present application;
FIG. 5 is a functional block diagram of a multimodal fusion communication system according to an embodiment of the present application;
FIG. 6 is a flow chart illustrating a multi-modal converged communication method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a multimodal fusion communication apparatus according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the embodiment of the application, in order to deal with a complex communication environment, the head-mounted equipment capable of rapidly extracting voice information, lip image information and face myoelectric information is designed. And according to the extracted information, evaluating the use environment, selecting a processing model suitable for the use environment to process and predict signals, and identifying the instruction information of the user, thereby realizing information exchange in a complex environment.
The following describes a communication method, an apparatus, a head-mounted device, and a storage medium for multimodal fusion proposed in the embodiments of the present application with reference to the accompanying drawings.
Example 1
The embodiment of the application provides a multi-modal fusion communication method, which is based on multi-modal information fusion to perform silence communication, and as shown in fig. 1, the method specifically comprises the following steps:
step 101: and acquiring voice information, lip image information and facial myoelectric information of the user.
The main execution body of the method is a head-wearing device, the head-wearing device is provided with an electromyographic signal acquisition device shown in fig. 2, when a user wears the head-wearing device on the head, the electromyographic signal acquisition device can be attached to the surface of facial muscle near the mouth of the user, the electromyographic signal acquisition device mainly comprises a sensor attached to the periphery of the mouth, and an acquisition part belongs to a mouth muscle group driven during speaking and is used for acquiring facial electromyographic information of the user in real time. The headset is further provided with a voice collecting device which can be a microphone, and when a user wears the headset on the head, the voice collecting device is located right near the mouth of the user and used for collecting voice information when the user speaks. The head-mounted equipment is also provided with a micro camera which is arranged on the voice acquisition device, and when a user wears the head-mounted equipment on the head, the micro camera is just aligned with the lip area of the user and is used for shooting the lip area of the user to obtain the lip image information of the user. In the embodiment of the present application, the head-mounted device may be designed in a manner similar to an earphone as shown in fig. 3, and the electromyographic signal collecting device shown in fig. 2 and the headset structure shown in fig. 3 may also be embedded in a helmet, wherein the headset shown in fig. 3 comprises a microphone and a micro-camera wound on the microphone.
In the embodiment of the application, the micro camera only collects the picture information of the fixed area of the lips of the user, but not collects the head image of the user, so that the preprocessing steps of positioning the lips, cutting the image and the like in the head image can be omitted, and the operation process of the system is accelerated.
Step 102: and determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face myoelectric information.
Before step 102 is executed, in the embodiment of the present application, voice information, lip image information, and facial myoelectric information of a large number of users are acquired in different environments, where the different environments include a noisy environment, and the like. And dividing different data sets according to different magnitudes according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the facial myoelectric information. And carrying out model training according to different data sets, determining model parameters under different environments, and obtaining an environment evaluation model.
And then, according to the voice information, the lip image information and the face electromyogram information which are acquired in the step 101, carrying out environment evaluation through the environment evaluation model, and determining a model parameter corresponding to the current environment.
According to the embodiment of the application, the signal magnitude corresponding to the current environment of the user is evaluated through the environment evaluation model according to the multi-mode information of the user, namely the voice information, the lip image information and the face myoelectric information, and the corresponding model parameters are selected, so that the environment adaptivity can be achieved to a certain degree.
Step 103: and identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information.
Determining a corresponding image processing model and a corresponding myoelectricity processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through an image processing model; according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through an electromyography processing model; and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
As shown in fig. 4, the lip image information is input into an image processing model that first processes the lip image information through a 3D convolutional layer that applies spatio-temporal convolution to the pre-processed image frame stream, which can capture the short-term dynamics of the lip region, the 3D convolutional layer being composed of convolutional layers of 64 3D kernels (5 × 7 × 7 size (time/width/height)) and then processed through normalization (BN) and Rectified linear unit (ReLU). The 3D feature map output by the 3D convolution layer passes through 34 layers of ResNet (residual Neural network) residual networks and two BGRU (bidirectional Gated Recurrent Unit) bidirectional gating circulation units.
The method comprises the steps of inputting facial electromyogram information into an electromyogram processing model, firstly carrying out 50Hz Chebyshev I type IIR notch filtering processing on the facial electromyogram information, then carrying out 0.1-70Hz Chebyshev I type IIR band-pass filtering processing on the facial electromyogram information, and firstly inputting 18 layers of ResNet and two BGRU layers into an obtained information stream, wherein due to the particularity (one-dimensional information) of the electromyogram signal, ResNet uses a 1-dimensional inner core, and a 5ms time core with a step length of 0.25ms is used in a first convolution layer of ResNet so as to extract fine-scale spectral information. ResNet makes the output frame number the same as the video frame rate by averaging the pooling layers. These frames are then fed to the ResNet layers, which consist of default kernels of size 3 by 1, thus extracting long-term myoelectric features deeper. The output of the ResNet-18 is fed to 2 layers of BGRUs, each containing 1024 cells.
As shown in fig. 4, the image processing model processes lip image information, the electromyography model processes facial electromyography information, and then the processing results of both are input to the information recognition model, and the final BGRU output of the image processing model and the final BGRU output of the electromyography model are connected and fed to one 2-layer BGRU of the information recognition model to fuse information from the video stream, the electromyography stream, and to simulate their temporal dynamics. And the output layer of the information identification model is a softmax layer, and finally identified instruction information of the user is output, so that the silence communication of facial muscle and image information fusion is realized.
The instruction information of the user can be natural language which is optionally spoken by the user or an instruction in a fixed instruction library. When the instruction information is an instruction in a fixed instruction library, a label can be allocated to each instruction, and the label to which the identified instruction sequence belongs is marked based on the highest average probability when the instruction information of the user is identified.
In the embodiment of the application, the signal processing adopts an end-to-end processing mode, and a data preprocessing stage is omitted. The image stream part adopts 3D convolution, can well capture time sequence information, then, two kinds of signal (image, myoelectric signal) streams are processed by ResNet, ResNet is composed of residual blocks, and through long jump connection (skipconnection), the problems of gradient disappearance and gradient explosion can be effectively avoided, and the training effect is improved.
Wherein, the correlation forward propagation formula is as follows,
z[l+1]=w[l+1]a[l]+b[l+1]
a[l+1]=g(z[l+1])
z[l+2]=w[l+2]a[l+1]+b[l+2]
a[l+2]=g(z[l+2]+a[l])
wherein, l is the number of layers where the neural network is located, z is the calculation result of each layer, a is the value processed by the activation function of each layer, and w and b are the parameters of the corresponding network layer (the corresponding relation is determined according to the superscript), that is, the part of the neural network which needs to be trained and updated after the initial value is given; g is the selected activation function (e.g., ReLU function), and the residual block in ResNet is characterized by the above equation 4, rather than simply performing a[l+2]=g(z[l+2]) Instead, add a to the first two layers[l]I.e. every second network layer, the calculation of a takes into account the information of the front layer.
The BGRU unit adds Memory unit information, simplifies an LSTM (Long Short-Term Memory network) network, can accelerate the operation speed compared with the LSTM, and simultaneously keeps the Memory of time sequence. The operation formula for time t in each layer GRU (gated cyclic unit) unit is as follows,
c<t-1>=a<t-1>
Figure BDA0002246809840000071
Γr=σ(wr[c<t-1>,x<t>]+br)
Γu=σ(wu[c<t-1>,x<t>]+bu)
Figure BDA0002246809840000072
the above formula is a calculation process for a certain layer of the neural network, since the input is a sequence, time t corresponds to information at a certain time in a sequence sample (in this example, time t corresponds to a certain frame or a certain small segment of the electromyographic signal in an image sequence), and the input at each time is information x at the time<t>Output c of the previous moment<t-1>The output is C at this time<t>According to different requirements, the C may be also treated<t>And processing the data to be used as the input of the next layer of neural network.
Wherein x<t>Is the information entered at time t, c<t-1>Is the memory cell information at time t-1 (previous time), a<t-1>Is the value that was processed by the activation function at the previous time. In GRU, the sum of its values c<t-1>The same, and therefore are not shown as additional representations. However, in LSTM, the values are different, and a is preserved for the sake of keeping uniformity<t-1>The expression of (a) is used,
Figure BDA0002246809840000081
is a substitute value for time t, and is used for updating C at time t<t>
Wherein, wc、bcIs used for calculating
Figure BDA0002246809840000082
I.e. the part of the neural network that needs to be trained for updating given the initial values. Gamma-shapedrIs a correlation gate, representation
Figure BDA0002246809840000083
And c<t-1>The correlation of (c). w is ar、brIs used for calculating gammarI.e. the part of the neural network that needs to be trained for updating given the initial values. Gamma-shapeduUpdating the gate (value between 0 and 1) and controlling C<t>What updates are made. w is au、buIs used for calculating gammauI.e. the part of the neural network that needs to be trained for updating given the initial values. σ is a Sigmoid activation function expressed as
Figure BDA0002246809840000084
C<t>The memory cell information at time t is the output at this time and is used as an input for the next time.
After the instruction information of the user is identified in the above mode, the instruction information is also displayed through the display equipment. The display device may be a mobile phone of a user or an external display screen of the head-mounted device.
After the instruction information is displayed, the user can see the identified instruction information, so that the user can confirm whether the identified instruction information is an instruction which the user really wants to express. And when the user confirms that the identified instruction information is an instruction which the user really wants to express, the user submits confirmation information to the head-mounted equipment. The head-mounted equipment receives the confirmation information of the user and sends the instruction information to the receiving end corresponding to the receiving party through the transmitting end, so that the communication between the user and the receiving party is realized.
In the embodiment of the application, the user can also receive the instruction information sent by the transmitting terminal of the other party through the receiving terminal of the head-mounted device, and the instruction information is transmitted to the user through the earphone arranged on the head-mounted device.
To facilitate understanding of the communication system provided in the embodiments of the present application, the following description is made with reference to fig. 5. As shown in fig. 5, the communication system includes a signal acquisition module, an environment evaluation module, an image sequence processing module, an electromyographic signal processing module, an information identification module, an instruction information display module, and an information transceiver module. The signal acquisition module comprises a lip image acquisition module and a face electromyogram signal acquisition module. The signal acquisition module comprises a headset, a camera and a myoelectricity acquisition system, wherein the microphone is responsible for communication in a normal environment, and is wound with a small camera to acquire lip information of a fixed size and a fixed area, and the myoelectricity acquisition module is attached to myoelectricity acquisition equipment on muscles near the mouth and is responsible for acquiring facial myoelectricity signals during speaking. The environment evaluation module is used for evaluating the signal magnitude corresponding to the current environment and selecting corresponding model parameters. The image sequence processing module and the electromyographic signal processing module are used for respectively processing lip image information and face electromyographic information. The information identification module is used for fusing the processing results of the image sequence processing module and the electromyographic signal processing module and identifying the instruction information of the user. The instruction information display module is used for displaying the identified instruction information of the user. The information receiving and transmitting module is used for receiving and transmitting instruction information and realizing communication with other users. The system adopts a multi-modal perception fusion processing technology, so that the information identification does not depend on a single perception system any more, and the universality of communication and the accuracy of information identification in different environments are improved.
In the embodiment of the application, in a normal communication environment, a user can use the voice acquisition device included in the head-mounted device to perform normal voice communication with the signal transceiving module. However, in some abnormal complex environments, such as the situation that the user cannot speak in a special combat environment or the situation that the user voice is easily masked in a very noisy environment, the voice information does not have a great effect on identifying the instruction information of the user. Therefore, after the voice information, the lip image information and the face electromyogram information are acquired, the signal-to-noise ratio of the voice information can be determined at first, if the signal-to-noise ratio exceeds a preset threshold value, the instruction information of the user is determined only according to the lip image information and the face electromyogram information in a mode of fusing the images and the electromyogram signals without considering the voice information. Therefore, communication is realized through the voice acquisition device in a normal environment, and communication is realized by relying on the electromyographic signal acquisition equipment and the WeChat camera in a complex silence environment.
The use process of the head-mounted device is shown in fig. 6, a user wears the device, turns on a device switch, and a system detects whether each module normally operates and whether normal communication can be performed among the modules. If the equipment can not normally operate, prompting the user to carry out inspection and repair on the corresponding part. The lip image acquisition module detects the state of the lips, judges whether to start speaking, if not, the lips are in a standby state, otherwise, the lips start signal acquisition, and different processing models are selected according to the environment to process signals to obtain instruction information. The speaker knows the instruction judgment result of the system through the instruction information display module, clicks the sending button after determining that the information is correct, the information sends the speaker instruction through the information receiving and sending module, and if the information is determined to be wrong, the speaker signal is collected again to carry out a new round of judgment.
In the embodiment of the application, the electromyographic signals are subjected to baseline operation to remove environmental noise, the lip picture sequence is not processed, the two signal streams are subjected to model selection through the environmental evaluation module, then the two signals respectively enter the corresponding processing modules, and the information identification module integrates the characteristics of the two signals to judge the instruction of the speaker. And after the instruction is obtained, the information is transmitted and communicated through the information transceiving module.
The collected signal flow is processed by signals to obtain an instruction prediction result, the prediction result can be displayed by an external display device, and after the speaker determines that no mistake exists, the instruction information is sent by a determination button of the information transceiving unit. The lip images and the facial myoelectric information are linked with each other and processed by signals, so that the environmental adaptability and the instruction identification accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of signal collection every time is reduced, and the accuracy of model prediction is improved.
Example 2
An embodiment of the present application provides a multi-modal converged communication device, which is configured to perform the multi-modal converged communication method according to the foregoing embodiment, as shown in fig. 7, and the device includes:
an information acquisition module 301, configured to acquire voice information, lip image information, and facial myoelectric information of a user;
an environment determining module 302, configured to determine, according to the voice information, the lip image information, and the facial myoelectric information, a model parameter corresponding to a current environment through a pre-trained environment assessment model;
and the instruction identification module 303 is configured to identify instruction information of the user according to the model parameter, the voice information, the lip image information, and the facial myoelectric information.
The information acquisition module 301 includes:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip region of a user through a miniature camera arranged on the voice acquisition device to obtain the lip image information of the user;
the myoelectricity collecting unit is used for collecting facial myoelectricity information of the user through the myoelectricity signal collecting device attached to the face of the user on the head-mounted device.
The instruction identification module 303 is configured to determine a corresponding image processing model and a corresponding myoelectric processing model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through the electromyography processing model; and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
The device also includes: the model training module is used for acquiring voice information, lip image information and facial myoelectric information under different environments; dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the face electromyogram information; and carrying out model training according to the different data sets to obtain an environment evaluation model.
Further comprising: the display module is used for displaying the instruction information through display equipment; and the receiving and sending module is used for receiving the confirmation information of the user and sending the instruction information to the receiving party.
In the embodiment of the application, the acquired signal flow is subjected to signal processing to obtain an instruction prediction result, the prediction result can be displayed through an external display device, and after the speaker determines that the speaker does not have an error, the instruction information is sent through the determination button of the information transceiving unit. The lip images and the facial myoelectric information are linked with each other and processed by signals, so that the environmental adaptability and the instruction identification accuracy of the interactive communication system are greatly improved. And the equipment is easy to wear, simple to use and easy to operate. The position of the collector is relatively fixed, so that the difference of signal collection every time is reduced, and the accuracy of model prediction is improved.
It should be noted that the foregoing explanation of the embodiment of the multimodal fusion communication method is also applicable to the multimodal fusion communication apparatus of the foregoing embodiment, and therefore, the explanation is not repeated herein.
Example 3
The embodiment of the application provides a head-mounted device, include: the communication method comprises a voice acquisition device, electromyographic signal acquisition equipment, a micro camera and a memory which are arranged on the voice acquisition device, a processor and an executable program stored on the memory, wherein the executable program is executed by the processor to realize the communication method of the multi-mode fusion in the embodiment.
Example 4
In order to implement the foregoing embodiments, the present application further proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the non-transitory computer-readable storage medium implements the unmanned vehicle navigation method according to any one of the foregoing embodiments.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the creation apparatus of a virtual machine according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for multimodal converged communication, comprising:
acquiring voice information, lip image information and facial myoelectric information of a user;
determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face electromyogram information;
and identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information.
2. The method of claim 1, wherein the acquiring of the voice information, lip image information and facial myoelectric information of the user comprises:
collecting voice information of a user through a voice collecting device included in the head-mounted equipment;
shooting the lip region of a user through a miniature camera arranged on the voice acquisition device to obtain the lip image information of the user;
and acquiring the facial electromyographic information of the user through an electromyographic signal acquisition device which is attached to the face of the user on the head-mounted device.
3. The method according to claim 1, wherein the recognizing the instruction information of the user based on the model parameter, the voice information, the lip image information, and the facial myoelectric information comprises:
determining a corresponding image processing model and a corresponding myoelectricity processing model according to the model parameters;
according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model;
according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through the electromyography processing model;
and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
4. The method according to claim 1, wherein before determining model parameters corresponding to a current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the facial electromyography information, the method further comprises:
acquiring voice information, lip image information and facial myoelectric information under different environments;
dividing different data sets according to the signal-to-noise ratio of the voice information, the brightness of the lip image information and the intensity of the face electromyogram information;
and carrying out model training according to the different data sets to obtain an environment evaluation model.
5. The method according to any one of claims 1-4, wherein the identifying the instruction information of the user further comprises:
displaying the instruction information through a display device;
and receiving the confirmation information of the user and sending the instruction information to a receiving party.
6. A multimodal converged communication device, comprising:
the information acquisition module is used for acquiring voice information, lip image information and facial myoelectric information of a user;
the environment determination module is used for determining model parameters corresponding to the current environment through a pre-trained environment evaluation model according to the voice information, the lip image information and the face electromyogram information;
and the instruction identification module is used for identifying the instruction information of the user according to the model parameters, the voice information, the lip image information and the face myoelectric information.
7. The apparatus of claim 6, wherein the information obtaining module comprises:
the voice acquisition unit is used for acquiring voice information of a user through a voice acquisition device included in the head-mounted equipment;
the image shooting unit is used for shooting the lip region of a user through a miniature camera arranged on the voice acquisition device to obtain the lip image information of the user;
the myoelectricity collecting unit is used for collecting facial myoelectricity information of the user through the myoelectricity signal collecting device attached to the face of the user on the head-mounted device.
8. The device of claim 6, wherein the instruction recognition module is configured to determine a corresponding image processing model and a corresponding electromyography model according to the model parameters; according to the lip image information, identifying a lip instruction corresponding to the lip image information through the image processing model; according to the facial electromyography information, a facial instruction corresponding to the facial electromyography information is identified through the electromyography processing model; and identifying the quality information of the user according to the voice information, the lip instruction and the face instruction.
9. A head-mounted device, comprising: the system comprises a voice acquisition device, an electromyographic signal acquisition device, a micro camera arranged on the voice acquisition device, a memory, a processor and an executable program stored on the memory, wherein the executable program is executed by the processor to realize the method according to any one of claims 1 to 5.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201911019740.9A 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium Active CN110865705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911019740.9A CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911019740.9A CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110865705A true CN110865705A (en) 2020-03-06
CN110865705B CN110865705B (en) 2023-09-19

Family

ID=69653139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911019740.9A Active CN110865705B (en) 2019-10-24 2019-10-24 Multi-mode fusion communication method and device, head-mounted equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110865705B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111755004A (en) * 2020-06-29 2020-10-09 苏州思必驰信息科技有限公司 Voice activity detection method and device
CN111798849A (en) * 2020-07-06 2020-10-20 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium
CN111899713A (en) * 2020-07-20 2020-11-06 中国人民解放军军事科学院国防科技创新研究院 Method, device, equipment and storage medium for silencing communication
CN111986674A (en) * 2020-08-13 2020-11-24 广州仿真机器人有限公司 Intelligent voice recognition method based on three-level feature acquisition
CN112001444A (en) * 2020-08-25 2020-11-27 斑马网络技术有限公司 Multi-scene fusion method for vehicle
CN113274038A (en) * 2021-04-02 2021-08-20 上海大学 Lip sensor device combining myoelectricity and pressure signals
CN113793047A (en) * 2021-09-22 2021-12-14 中国民航大学 Pilot cooperative communication capacity evaluation method and device
CN114917544A (en) * 2022-05-13 2022-08-19 上海交通大学医学院附属第九人民医院 Visual auxiliary training method and equipment for function of orbicularis oris
CN116766207A (en) * 2023-08-02 2023-09-19 中国科学院苏州生物医学工程技术研究所 Robot control method based on multi-mode signal motion intention recognition

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007018006A (en) * 2006-09-25 2007-01-25 Ntt Docomo Inc Speech synthesis system, speech synthesis method, and speech synthesis program
US20110071830A1 (en) * 2009-09-22 2011-03-24 Hyundai Motor Company Combined lip reading and voice recognition multimodal interface system
CN104410883A (en) * 2014-11-29 2015-03-11 华南理工大学 Mobile wearable non-contact interaction system and method
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
WO2016150001A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Speech recognition method, device and computer storage medium
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN108228285A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of human-computer interaction instruction identification method multi-modal end to end
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN108594987A (en) * 2018-03-20 2018-09-28 中国科学院自动化研究所 More man-machine coordination Behavior Monitor Systems based on multi-modal interaction and its control method
CN108597501A (en) * 2018-04-26 2018-09-28 深圳市唯特视科技有限公司 A kind of audio-visual speech model based on residual error network and bidirectional valve controlled cycling element
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN108877801A (en) * 2018-06-14 2018-11-23 南京云思创智信息科技有限公司 More wheel dialog semantics based on multi-modal Emotion identification system understand subsystem
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
US20190074012A1 (en) * 2017-09-05 2019-03-07 Massachusetts Institute Of Technology Methods and Apparatus for Silent Speech Interface
CN109558788A (en) * 2018-10-08 2019-04-02 清华大学 Silent voice inputs discrimination method, computing device and computer-readable medium
CN110059575A (en) * 2019-03-25 2019-07-26 中国科学院深圳先进技术研究院 A kind of augmentative communication system based on the identification of surface myoelectric lip reading
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction
CN110110603A (en) * 2019-04-10 2019-08-09 天津大学 A kind of multi-modal labiomaney method based on facial physiologic information
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN110286756A (en) * 2019-06-13 2019-09-27 深圳追一科技有限公司 Method for processing video frequency, device, system, terminal device and storage medium

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007018006A (en) * 2006-09-25 2007-01-25 Ntt Docomo Inc Speech synthesis system, speech synthesis method, and speech synthesis program
US20110071830A1 (en) * 2009-09-22 2011-03-24 Hyundai Motor Company Combined lip reading and voice recognition multimodal interface system
CN104410883A (en) * 2014-11-29 2015-03-11 华南理工大学 Mobile wearable non-contact interaction system and method
WO2016150001A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Speech recognition method, device and computer storage medium
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
CN108228285A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of human-computer interaction instruction identification method multi-modal end to end
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
US20190074012A1 (en) * 2017-09-05 2019-03-07 Massachusetts Institute Of Technology Methods and Apparatus for Silent Speech Interface
CN108594987A (en) * 2018-03-20 2018-09-28 中国科学院自动化研究所 More man-machine coordination Behavior Monitor Systems based on multi-modal interaction and its control method
CN108537207A (en) * 2018-04-24 2018-09-14 Oppo广东移动通信有限公司 Lip reading recognition methods, device, storage medium and mobile terminal
CN108597501A (en) * 2018-04-26 2018-09-28 深圳市唯特视科技有限公司 A kind of audio-visual speech model based on residual error network and bidirectional valve controlled cycling element
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN108899050A (en) * 2018-06-14 2018-11-27 南京云思创智信息科技有限公司 Speech signal analysis subsystem based on multi-modal Emotion identification system
CN108877801A (en) * 2018-06-14 2018-11-23 南京云思创智信息科技有限公司 More wheel dialog semantics based on multi-modal Emotion identification system understand subsystem
CN109558788A (en) * 2018-10-08 2019-04-02 清华大学 Silent voice inputs discrimination method, computing device and computer-readable medium
CN110059575A (en) * 2019-03-25 2019-07-26 中国科学院深圳先进技术研究院 A kind of augmentative communication system based on the identification of surface myoelectric lip reading
CN110110603A (en) * 2019-04-10 2019-08-09 天津大学 A kind of multi-modal labiomaney method based on facial physiologic information
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction
CN110276259A (en) * 2019-05-21 2019-09-24 平安科技(深圳)有限公司 Lip reading recognition methods, device, computer equipment and storage medium
CN110286756A (en) * 2019-06-13 2019-09-27 深圳追一科技有限公司 Method for processing video frequency, device, system, terminal device and storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111755004A (en) * 2020-06-29 2020-10-09 苏州思必驰信息科技有限公司 Voice activity detection method and device
CN111798849A (en) * 2020-07-06 2020-10-20 广东工业大学 Robot instruction identification method and device, electronic equipment and storage medium
CN111899713A (en) * 2020-07-20 2020-11-06 中国人民解放军军事科学院国防科技创新研究院 Method, device, equipment and storage medium for silencing communication
CN111986674A (en) * 2020-08-13 2020-11-24 广州仿真机器人有限公司 Intelligent voice recognition method based on three-level feature acquisition
CN111986674B (en) * 2020-08-13 2021-04-09 广州仿真机器人有限公司 Intelligent voice recognition method based on three-level feature acquisition
CN112001444A (en) * 2020-08-25 2020-11-27 斑马网络技术有限公司 Multi-scene fusion method for vehicle
CN113274038A (en) * 2021-04-02 2021-08-20 上海大学 Lip sensor device combining myoelectricity and pressure signals
CN113793047A (en) * 2021-09-22 2021-12-14 中国民航大学 Pilot cooperative communication capacity evaluation method and device
CN114917544A (en) * 2022-05-13 2022-08-19 上海交通大学医学院附属第九人民医院 Visual auxiliary training method and equipment for function of orbicularis oris
CN114917544B (en) * 2022-05-13 2023-09-22 上海交通大学医学院附属第九人民医院 Visual method and device for assisting orbicularis stomatitis function training
CN116766207A (en) * 2023-08-02 2023-09-19 中国科学院苏州生物医学工程技术研究所 Robot control method based on multi-mode signal motion intention recognition

Also Published As

Publication number Publication date
CN110865705B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN110865705B (en) Multi-mode fusion communication method and device, head-mounted equipment and storage medium
US20220377467A1 (en) Hearing aid systems and mehods
KR20200091839A (en) Communication device, communication robot and computer readable storage medium
CN109120790B (en) Call control method and device, storage medium and wearable device
US20230045237A1 (en) Wearable apparatus for active substitution
US20180085928A1 (en) Robot, robot control method, and robot system
WO2018175959A1 (en) System and method of correlating mouth images to input commands
CN110035141A (en) A kind of image pickup method and equipment
JP2016051081A (en) Device and method of sound source separation
CN110781899B (en) Image processing method and electronic device
CN111935573B (en) Audio enhancement method and device, storage medium and wearable device
WO2022033556A1 (en) Electronic device and speech recognition method therefor, and medium
CN109743504A (en) A kind of auxiliary photo-taking method, mobile terminal and storage medium
CN114187547A (en) Target video output method and device, storage medium and electronic device
CN110837750A (en) Human face quality evaluation method and device
US20230206093A1 (en) Music recommendation method and apparatus
CN114242037A (en) Virtual character generation method and device
CN109986553B (en) Active interaction robot, system, method and storage device
CN110491384B (en) Voice data processing method and device
CN110446996A (en) A kind of control method, terminal and system
CN110971924B (en) Method, device, storage medium and system for beautifying in live broadcast process
JP5669302B2 (en) Behavior information collection system
CN108140124B (en) Prompt message determination method and device and electronic equipment
CN115657859A (en) Intelligent interaction system based on virtual reality
CN113611318A (en) Audio data enhancement method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant