CN115457953A - Neural network multi-command word recognition method and system based on wearable device - Google Patents

Neural network multi-command word recognition method and system based on wearable device Download PDF

Info

Publication number
CN115457953A
CN115457953A CN202210888530.9A CN202210888530A CN115457953A CN 115457953 A CN115457953 A CN 115457953A CN 202210888530 A CN202210888530 A CN 202210888530A CN 115457953 A CN115457953 A CN 115457953A
Authority
CN
China
Prior art keywords
command word
layer
voice
gru
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210888530.9A
Other languages
Chinese (zh)
Inventor
纪盟盟
王蒙
胡光敏
龚永康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ccvui Intelligent Technology Co ltd
Original Assignee
Hangzhou Ccvui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ccvui Intelligent Technology Co ltd filed Critical Hangzhou Ccvui Intelligent Technology Co ltd
Priority to CN202210888530.9A priority Critical patent/CN115457953A/en
Publication of CN115457953A publication Critical patent/CN115457953A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a neural network multi-command word recognition method and system based on wearable equipment, and relates to the technical field of audio processing. The neural network technology is used, various noises are mixed in training data, and the recognition accuracy and robustness are improved; the MFCC characteristics of voice are used as the input of the network, the CNN is used for carrying out characteristic extraction on the first layer of the network, the CNN can be used for carrying out weight sharing, the parameter number of the network can be reduced to a great extent, then, a GRU layer is added, the information between the previous frames in the voice section can be fully utilized, and the interframe characteristics can be obtained through the step, so that the overall recognition degree and the recognition efficiency of the system are improved; the voice detection is carried out by using the VAD voice detection module, and the multi-command word detection algorithm does not work when no voice exists, so that the power consumption of the system can be reduced; the reset of the GRU state can ensure the GRU state to be the same as the training condition, thereby ensuring the identification accuracy and robustness of the algorithm.

Description

Neural network multi-command word recognition method and system based on wearable device
Technical Field
The invention relates to the technical field of audio processing, in particular to a neural network multi-command word recognition method and system based on wearable equipment.
Background
The multi-command word recognition algorithm is one of algorithms commonly used for intelligent voice, and is widely applied to applications such as intelligent voice man-machine interaction. In the voice-based human-computer interaction process, a voice instruction sent by a human is transmitted into the machine through the microphone, in the machine, the multi-command word recognition algorithm can recognize specific command words, and when the specific command words are recognized, signals are fed back to the machine, so that the machine can make corresponding interaction reactions.
The wearable device-based multi-command word recognition can enable the device to communicate with the mobile phone through the Bluetooth module, the algorithm is integrated on the wearable device, the network is not needed, real-time and accurate multi-command word recognition can be achieved, and then human-computer interaction is achieved.
However, the existing multi-command word recognition scheme has the problems of poor robustness and low detection accuracy, has poor recognition effect on human voice signals in the presence of noise, and keeps a standby state at any time, so that the system has high energy consumption.
Therefore, it is necessary to provide a neural network multi-command word recognition method and system based on a wearable device to solve the above technical problems.
Disclosure of Invention
In order to solve one of the above technical problems, the present invention provides a neural network multi-command word recognition method based on wearable devices, which includes acquiring microphone signals by the wearable devices, and converting the microphone signals into digital input signal streams by an analog/digital converter; the digital input signal flow carries out voice detection through a VAD voice detection module, when noise is detected, the VAD voice detection module does not activate VAD flag bits, and the multi-command word recognition algorithm does not carry out operation; when a voice signal is detected, the VAD voice detection module activates a VAD flag bit and enters a multi-command word recognition algorithm; and after the multi-command word recognition algorithm is in a reset state, voice speech recognition is started.
Specifically, the multi-command word recognition algorithm comprises a voice MFCC feature extraction step, a CNN layer feature extraction step, a GRU layer sequence frame information extraction step and a DENSE layer command word classification step.
Specifically, the voice MFCC feature extraction step: selecting a Mel frequency cepstrum coefficient of a digital input signal stream as an input feature, and performing MFCC feature extraction to obtain an MFCC feature corresponding to the digital input signal stream; the MFCC feature extraction step comprises pre-emphasis, framing and windowing, FFT processing, mel filter processing, logarithmic operation and DCT transformation.
Specifically, the CNN layer feature extraction step: inputting MFCC characteristics, performing convolution operation on the MFCC characteristics to obtain a plurality of frames of CNN characteristic graphs, and obtaining sequence frames according to output sequence.
Specifically, the GRU layer extracts information between sequence frames: and performing interframe information extraction on the sequence frame through the GRU layer to obtain interframe information characteristics.
Specifically, the DENSE layer performs a command word classification step: inputting the inter-frame information characteristics into a DENSE layer, wherein the DENSE layer is obtained through network training and can output the classification probability of each command word corresponding to the voice signal according to the input inter-frame information characteristics, and the command words conveyed by the voice signal are judged according to the classification probability of each command word.
As a further solution, the pre-emphasis of the speech MFCC feature extraction step is chosen to have a pre-emphasis coefficient of 0.97.
As a further solution, the frame length of the frame windowing of the voice MFCC feature extraction step is 32ms, the frame shift is 16ms, and each frame is windowed using a Hamming window.
As a further solution, the voice MFCC feature extraction step performs fast fourier transform by FFT processing; filtering the sub-band by Mel filter processing; processing the output of the Mel filter by a logarithmic operation; the MFCC features are obtained by discrete cosine transform via DCT transform.
As a further solution, the CNN layer feature extraction step uses 16 convolution kernels of size [20,5] to process the MFCC features, and the step size is taken as [1,2]; a characteristic diagram with input dimensionality of [68, 40] of the CNN layer obtained in the CNN layer characteristic extraction step; 68 shows that the voice data of 1.1 second is divided into 6 frames, and 40 shows that 40 MFCC features are extracted from each frame; after the convolution operation, the signature size is [49, 18, 16].
Resetting the multi-command word recognition algorithm, namely resetting the state of the GRU layer; the GRU layer in the step of extracting information between sequence frames is a unidirectional GRU, 44 neurons are used, and the output of the CNN layer is input to the GRU layer after dimension resetting; wherein the dimension is reset to [49, 288], and the dimension of the GRU layer output is [44].
As a further solution, the GRU layer is deployed by the following formula:
Z t =σ((X t ,W xz )+(H t-1 ,W hz )+b z )
R t =σ((X t ,W xr )+(H t-1 ,W hr )+b r )
H_tilda=tanh((X t ,W xh )+(H t-1 R t ,W hh )+b h )
H t =H t-1 Z t +H_tilda(1-Z t )
wherein, X t Denotes the input of the GRU layer, H t-1 Representing the hidden layer state at the previous moment, H t Indicating the hidden layer state of the output at time t, W xr 、W hr 、W xz 、W hz 、W xh 、W hh Representing a weight matrix; b is a mixture of r 、b z 、b h Denotes the offset, R t Denotes a reset gate, Z t Represents an update gate, H _ tilda represents information that needs to be updated, tanh (-) represents a Tanh activation function, and σ (-) represents a Sigmoid activation function.
As a further solution, the input of the DENSE layer that the DENSE layer performs the command word classification step is the output of the GRU layer; the output size of the Dense layer is 10, the output dimension is [10], wherein each dimension represents the probability of 9 command words and 1 negative sample class respectively.
As a further solution, the network training framework of the DENSE layer is based on a tensflo framework, the batch size adopted during training is 1024, and the iteration number is 50 generations; the data used for network training are clear voice data and voice data after noise mixing; training data are unified to 1.1 seconds, and a plurality of different noises are randomly mixed when noises are mixed; and (3) the network output of the DENSE layer is the probability of the corresponding category, the probability above 0.9 is classified into the corresponding command word category, and otherwise, the probability is defaulted to be the negative sample category.
As a further solution, the detection of the human voice signal of the wearable device collecting the microphone signal and the recognition of the corresponding command word are realized by the neural network multi-command word recognition method based on the wearable device as described in any one of the above.
Compared with the related art, the neural network multi-command word recognition method based on the wearable equipment has the following beneficial effects:
1. the invention uses the neural network technology to mix various noises in the training data, thereby improving the identification accuracy and robustness;
2. the invention uses MFCC characteristics of voice as the input of the network, in the first layer of the network, use CNN to carry on the characteristic extraction, use CNN can carry on the weight sharing, can reduce the parameter quantity of the network to a great extent, add a layer of GRU layer subsequently, can fully utilize the information among the previous frames in the voice section, make the extraction among the voice characteristics more abundant, use a full connection layer to classify finally, divide 10 categories, can get the interframe characteristic through this step, thus promote the whole recognition degree and recognition efficiency of the system;
3. the voice detection is carried out by using the VAD voice detection module, when the microphone receives voice, the VAD voice detection module gives an active state, when the multi-command word recognition algorithm receives the active state, the first frame resets the GRU initial state and starts to detect the command words; when no voice exists, the multi-command word detection algorithm does not work, so that the power consumption of the system can be reduced; the reset of the GRU state can ensure the GRU state to be the same as the training condition, thereby ensuring the identification accuracy and robustness of the algorithm.
Drawings
Fig. 1 is a flowchart illustrating a neural network multi-command word recognition method based on a wearable device according to an embodiment of the present invention;
fig. 2 is a schematic diagram of MFCC feature extraction of a neural network multi-command word recognition method based on a wearable device according to an embodiment of the present invention;
fig. 3 is a schematic diagram of feature extraction at a GRU layer of a neural network multi-command word recognition method based on a wearable device according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and embodiments.
As shown in fig. 1 and fig. 3, the present embodiment provides a neural network multi-command word recognition method based on a wearable device, which collects microphone signals through the wearable device and converts the microphone signals into digital input signal streams through an analog-to-digital converter; the digital input signal flow carries out voice detection through a VAD voice detection module, when noise is detected, the VAD voice detection module does not activate VAD flag bits, and the multi-command word recognition algorithm does not carry out operation; when a voice signal is detected, the VAD voice detection module activates a VAD flag bit and enters a multi-command word recognition algorithm; and after the multi-command word recognition algorithm is in a reset state, voice speech recognition is started.
Specifically, the multi-command word recognition algorithm comprises a voice MFCC feature extraction step, a CNN layer feature extraction step, a GRU layer sequence frame information extraction step and a DENSE layer command word classification step.
As shown in fig. 2, specifically, the voice MFCC feature extraction step: selecting a Mel frequency cepstrum coefficient of a digital input signal stream as an input feature, and performing MFCC feature extraction to obtain an MFCC feature corresponding to the digital input signal stream; the MFCC feature extraction step comprises pre-emphasis, framing and windowing, FFT processing, mel filter processing, logarithmic operation and DCT transformation.
Specifically, the CNN layer feature extraction step: inputting MFCC characteristics, performing convolution operation on the MFCC characteristics to obtain a plurality of frames of CNN characteristic graphs, and obtaining sequence frames according to output sequence.
Specifically, the step of extracting information between sequence frames by the GRU layer: and performing interframe information extraction on the sequence frame through the GRU layer to obtain interframe information characteristics.
Specifically, the DENSE layer performs a command word classification step: inputting the inter-frame information characteristics into a DENSE layer, wherein the DENSE layer is obtained through network training and can output the classification probability of each command word corresponding to the voice signal according to the input inter-frame information characteristics, and the command words conveyed by the voice signal are judged according to the classification probability of each command word.
It should be noted that: the embodiment uses the VAD voice detection algorithm to detect voice, when voice in a microphone passes through the VAD algorithm, the VAD can give a state of a flag bit, when the voice is not detected, the multi-command word recognition algorithm does not perform calculation, and when the voice is detected, the initial state Ht-1 of a first frame is set to be 0, so that the voice detection algorithm can be the same as a training situation when in use, and the recognition accuracy and robustness of the algorithm are improved.
The multi-command word recognition algorithm extracts the received digital signals into MFCC characteristics which serve as input of a neural network, the CNN convolution layer is used for extracting the characteristics in the first layer of the network, after the characteristics are extracted preliminarily, the characteristics of the sequence are input into the subsequent GRU layer, and the GRU layer can fully extract the time sequence characteristics of the voice section and serve as input of the subsequent DENSE classification layer. The classification layer may obtain 10 categories, including 9 command word categories and 1 negative sample category.
As a further solution, the pre-emphasis of the speech MFCC feature extraction step is chosen to have a pre-emphasis coefficient of 0.97.
As a further solution, the frame length of the frame windowing of the voice MFCC feature extraction step is 32ms, the frame shift is 16ms, and each frame is windowed using a Hamming window.
As a further solution, the voice MFCC feature extraction step performs fast fourier transform by FFT processing; filtering the sub-band by Mel-filter processing; processing the output of the Mel filter by a logarithmic operation; the MFCC features are obtained by discrete cosine transform via DCT transform.
It should be noted that: mel-Frequency Cepstral Coefficients (MFCC) was chosen as the input feature for the model. The extraction process includes pre-emphasis, framing and windowing, FFT, mel filter, logarithm calculation, DCT transformation, etc., and the process sequence and processing procedure are shown in the following figure. The lowest frequency and the highest frequency of the filter bank can be selected according to the frequency range of the actually recorded voice. Thereby reducing the impact of extraneous frequency bands.
As a further solution, the CNN layer extraction feature step uses 16 convolution kernels of size [20,5] to process MFCC features, and the step size is taken as [1,2]; the CNN layer characteristic extraction step is used for obtaining a characteristic diagram with input dimensions of [68, 40] of the CNN layer; 68 shows that the voice data of 1.1 second is divided into 6 frames, and 40 shows that 40 MFCC features are extracted from each frame; after the convolution operation, the signature size is [49, 18, 16].
Resetting the multi-command word recognition algorithm, namely resetting the state of the GRU layer; the GRU layer in the step of extracting information between sequence frames is a unidirectional GRU, 44 neurons are used, and the output of the CNN layer is input to the GRU layer after dimension resetting; wherein, the dimension is reset to [49, 288], and the dimension of the GRU layer output is [44].
As a further solution, as shown in fig. 3, the GRU layer is deployed by the following formula:
Z t =σ((X t ,W xz )+(H t-1 ,W hz )+b z )
R t =σ((X t ,W xr )+(H t-1 ,W hr )+b r )
H_tilda=tanh((X t ,W xh )+(H t-1 R t ,W hh )+b h )
H t =H t-1 Z t +H_tilda(1-Z t )
wherein, X t Representing the input of the GRU layer, H t-1 Representing the hidden layer state at the previous moment, H t Indicating the hidden layer state of the output at time t, W xr 、W hr 、W xz 、W hz 、W xh 、W hh Representing a weight matrix; b r 、b z 、b h Denotes the offset, R t Denotes a reset gate, Z t Represents an update gate, H _ tilda represents information that needs to be updated, tanh (-) represents a Tanh activation function, and σ (-) represents a Sigmoid activation function.
As a further solution, the input of the sense layer, which the sense layer performs the command word classification step, is the output of the GRU layer, said; the output size of the Dense layer is 10, the output dimension is [10], wherein each dimension represents the probability of 9 command words and 1 negative sample class respectively.
As a further solution, the network training framework of the DENSE layer is based on a tensflo framework, the batch size adopted during training is 1024, and the iteration number is 50 generations; the data used for network training are clear voice data and voice data after noise mixing; training data are unified to 1.1 seconds, and a plurality of different noises are randomly mixed when noises are mixed; and (3) the network output of the DENSE layer is the probability of the corresponding category, the probability above 0.9 is classified into the corresponding command word category, and otherwise, the probability is defaulted to be the negative sample category.
As a further solution, the detection of the human voice signal of the wearable device collecting microphone signal and the identification of the corresponding command word are realized by the wearable device-based neural network multi-command word identification method as described in any one of the above.
The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims (10)

1. A neural network multi-command word recognition method based on wearable equipment is characterized in that microphone signals are collected through the wearable equipment and are converted into digital input signal streams through an analog-to-digital converter; the digital input signal flow carries out voice detection through a VAD voice detection module, when noise is detected, the VAD voice detection module does not activate VAD flag bits, and the multi-command word recognition algorithm does not carry out operation; when a voice signal is detected, the VAD voice detection module activates a VAD flag bit and enters a multi-command word recognition algorithm; after the multi-command word recognition algorithm is in a reset state, voice speech recognition is started;
the multi-command word recognition algorithm comprises a voice MFCC feature extraction step, a CNN layer feature extraction step, a GRU layer information extraction sequence frame step and a DENSE layer command word classification step;
the voice MFCC feature extraction step: selecting a Mel frequency cepstrum coefficient of a digital input signal stream as an input characteristic, and performing MFCC characteristic extraction to obtain an MFCC characteristic corresponding to the digital input signal stream; the MFCC feature extraction step comprises pre-emphasis, framing and windowing, FFT processing, mel filter processing, logarithmic operation and DCT transformation;
the CNN layer characteristic extraction step: inputting MFCC characteristics, performing convolution operation on the MFCC characteristics to obtain a plurality of frames of CNN characteristic diagrams, and obtaining sequence frames according to output sequence;
the GRU layer extracts information between sequence frames: extracting interframe information of the sequence frames through a GRU layer to obtain interframe information characteristics;
the DENSE layer carries out a command word classification step: inputting the inter-frame information characteristics into a DENSE layer, wherein the DENSE layer is obtained through network training and can output the classification probability of each command word corresponding to the voice signal according to the input inter-frame information characteristics, and the command words conveyed by the voice signal are judged according to the classification probability of each command word.
2. The method as claimed in claim 1, wherein the pre-emphasis of the voice MFCC feature extraction step is selected to be 0.97.
3. The method for recognizing the multi-command word in the neural network based on the wearable device as claimed in claim 1, wherein the frame length of the frame windowing of the voice MFCC feature extraction step is 32ms, the frame shift is 16ms, and each frame is windowed by using a Hamming window.
4. The neural network multi-command word recognition method based on wearable equipment as claimed in claim 1, wherein the voice MFCC feature extraction step is fast Fourier transformed by FFT processing; filtering the sub-band by Mel-filter processing; processing the output of the Mel filter by a logarithmic operation; the MFCC features are obtained by discrete cosine transform via DCT transform.
5. The wearable device-based neural network multi-command word recognition method of claim 1, wherein the CNN layer feature extraction step uses 16 convolutional checks of size [20,5] to process MFCC features, and the step size is taken as [1,2]; the CNN layer characteristic extraction step is used for obtaining a characteristic diagram with input dimensions of [68, 40] of the CNN layer; wherein 68 represents that the voice data of 1.1 seconds is divided into 6 frames, and 40 represents that 40 MFCC features are extracted from each frame; after the convolution operation, the signature size is [49, 18, 16].
6. The neural network multi-command word recognition method based on the wearable device according to claim 1, wherein the multi-command word recognition algorithm is reset as a state reset of a GRU layer; the GRU layer in the step of extracting information between sequence frames is a unidirectional GRU, 44 neurons are used, and the output of the CNN layer is input to the GRU layer after dimension resetting; wherein, the dimension is reset to [49, 288], and the dimension of the GRU layer output is [44].
7. The neural network multi-command word recognition method based on the wearable device of claim 1, wherein the GRU layer is deployed by the following formula:
Z t =σ((X t ,W xz )+(H t-1 ,W hz )+b z )
R t =σ((X t ,W xr )+(H t-1 ,W hr )+b r )
H_tilda=tanh((X t ,W xh )+(H t-1 R t ,W hh )+b h )
H t =H t-1 Z t +H_tilda(1-Z t )
wherein, X t Denotes the input of the GRU layer, H t-1 Representing the hidden layer state at the previous moment, H t Indicating the hidden layer state of the output at time t, W xr 、W hr 、W xz 、W hz 、W xh 、W hh Representing a weight matrix; b r 、b z 、b h Denotes the offset, R t Denotes a reset gate, Z t Represents an update gate, H _ tilda represents information that needs to be updated, tanh (-) represents a Tanh activation function, and σ (-) represents a Sigmoid activation function.
8. The neural network multi-command word recognition method based on the wearable device according to claim 1, wherein the input of a Dense layer of the DenSE layer for carrying out the command word classification step is the output of a GRU layer; the Dense layer output size is 10, and the output dimension is [10], wherein each dimension represents the probability of 9 command words and 1 negative sample class respectively.
9. The neural network multi-command word recognition method based on the wearable device of claim 8, wherein the network training framework of the DENSE layer is based on a Tensorflow framework, the batch size adopted in training is 1024, and the iteration number is 50 generations; the data used for network training are clear voice data and voice data after noise mixing; training data are unified to 1.1 seconds, and a plurality of different noises are randomly mixed when noises are mixed; and (3) the network output of the DENSE layer is the probability of the corresponding category, the probability above 0.9 is classified into the corresponding command word category, and otherwise, the probability is defaulted to be the negative sample category.
10. A wearable device-based neural network multi-command word recognition system, which is operated on a hardware device and realizes the detection of human voice signals of microphone signals collected by a wearable device and the recognition of corresponding command words through the wearable device-based neural network multi-command word recognition method according to any one of claims 1 to 9.
CN202210888530.9A 2022-07-27 2022-07-27 Neural network multi-command word recognition method and system based on wearable device Pending CN115457953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210888530.9A CN115457953A (en) 2022-07-27 2022-07-27 Neural network multi-command word recognition method and system based on wearable device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210888530.9A CN115457953A (en) 2022-07-27 2022-07-27 Neural network multi-command word recognition method and system based on wearable device

Publications (1)

Publication Number Publication Date
CN115457953A true CN115457953A (en) 2022-12-09

Family

ID=84295896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210888530.9A Pending CN115457953A (en) 2022-07-27 2022-07-27 Neural network multi-command word recognition method and system based on wearable device

Country Status (1)

Country Link
CN (1) CN115457953A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023141701A1 (en) * 2022-01-25 2023-08-03 Blumind Inc. Analog systems and methods for audio feature extraction and natural language processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023141701A1 (en) * 2022-01-25 2023-08-03 Blumind Inc. Analog systems and methods for audio feature extraction and natural language processing

Similar Documents

Publication Publication Date Title
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
Milton et al. SVM scheme for speech emotion recognition using MFCC feature
CN103117059B (en) Voice signal characteristics extracting method based on tensor decomposition
CN103065629A (en) Speech recognition system of humanoid robot
CN110120227A (en) A kind of depth stacks the speech separating method of residual error network
CN109192200B (en) Speech recognition method
CN113053410B (en) Voice recognition method, voice recognition device, computer equipment and storage medium
CN112071308A (en) Awakening word training method based on speech synthesis data enhancement
Shi et al. End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.
CN110415697A (en) A kind of vehicle-mounted voice control method and its system based on deep learning
CN115457953A (en) Neural network multi-command word recognition method and system based on wearable device
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
CN1300763C (en) Automatic sound identifying treating method for embedded sound identifying system
Liu et al. Simple pooling front-ends for efficient audio classification
CN113077798B (en) Old man calls for help equipment at home
Espi et al. Spectrogram patch based acoustic event detection and classification in speech overlapping conditions
CN117063229A (en) Interactive voice signal processing method, related equipment and system
Chen et al. Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion.
Mendiratta et al. ASR system for isolated words using ANN with back propagation and fuzzy based DWT
CN112992131A (en) Method for extracting ping-pong command of target voice in complex scene
Zhou et al. Environmental sound classification of western black-crowned gibbon habitat based on spectral subtraction and VGG16
Khan et al. Isolated Bangla word recognition and speaker detection by semantic modular time delay neural network (MTDNN)
Tao et al. Design of elevator auxiliary control system based on speech recognition
CN112599123B (en) Lightweight speech keyword recognition network, method, device and storage medium
Bhagath et al. Telugu Spoken Digits Modeling using Convolutional Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination