US20200013390A1 - Speech wakeup method, apparatus, and electronic device - Google Patents
Speech wakeup method, apparatus, and electronic device Download PDFInfo
- Publication number
- US20200013390A1 US20200013390A1 US16/571,468 US201916571468A US2020013390A1 US 20200013390 A1 US20200013390 A1 US 20200013390A1 US 201916571468 A US201916571468 A US 201916571468A US 2020013390 A1 US2020013390 A1 US 2020013390A1
- Authority
- US
- United States
- Prior art keywords
- speech
- speech wakeup
- speech data
- wakeup model
- electronic device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 230000002123 temporal effect Effects 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 35
- 238000012545 processing Methods 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 18
- 238000011478 gradient descent method Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 102000015779 HDL Lipoproteins Human genes 0.000 description 1
- 108010010234 HDL Lipoproteins Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 229910052754 neon Inorganic materials 0.000 description 1
- GKAOGPIIYCISHV-UHFFFAOYSA-N neon atom Chemical compound [Ne] GKAOGPIIYCISHV-UHFFFAOYSA-N 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- This specification relates to the field of computer technologies, and in particular, to a speech wakeup method, apparatus, and electronic device.
- speech recognition is used in increasingly popular conversational assistants like Apple's Sin, Microsoft's Cortana, and Amazon's Alexa to enhance user experience and natural human-computer interaction level.
- Keyword Spotting An important speech interaction technology is Keyword Spotting (KWS), which may also be generally referred to as speech wakeup or speech recognition. Based on the prior art, there is a need for a speech wakeup solution that may not rely on keyword-specific speech data.
- a speech wakeup method, apparatus, and electronic device are provided in embodiments of this specification, for solving the following technical problem: there is a need for a speech wakeup solution that may not rely on keyword-specific speech data.
- a speech wakeup method comprises: inputting speech data to a speech wakeup model trained with general speech data, and outputting, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a Deep Neural Network (DNN) and a Connectionist Temporal Classifier (CTC).
- DNN Deep Neural Network
- CTC Connectionist Temporal Classifier
- an electronic device comprises: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor to cause the electronic device to: input speech data to a speech wakeup model trained with general speech data, and output from the speech wakeup model a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- a non-transitory computer-readable storage medium has stored therein instructions that, when executed by a processor of an electronic device, cause the electronic device to perform a speech wakeup method, the method comprising: inputting speech data to a speech wakeup model trained with general speech data, and outputting, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- the speech wakeup model may be trained with accessible general speech data, and then the trained speech wakeup model can be used for speech wakeup, which is conducive to improving the accuracy of speech wakeup.
- FIG. 1 is a schematic diagram of an overall architecture for speech wakeup, according to an embodiment.
- FIG. 2 is a flowchart of a speech wakeup method according to an embodiment.
- FIG. 3 is a schematic diagram of a speech wakeup model according to an embodiment.
- FIG. 4 is a schematic diagram of feature extraction performed by a feature extracting module according to an embodiment.
- FIG. 5 is a schematic diagram of a Deep Neural Network (DNN) according to an embodiment.
- DNN Deep Neural Network
- FIG. 6 is a schematic diagram of a Connectionist Temporal Classifier (CTC) according to an embodiment.
- CTC Connectionist Temporal Classifier
- FIG. 7 is a schematic diagram of a speech wakeup apparatus according to an embodiment.
- FIG. 8 is a schematic diagram of an electronic device according to an embodiment.
- a speech wakeup method, apparatus, and electronic device are provided in embodiments of this specification.
- a speech wakeup model including a Deep Neural Network (DNN) and a Connectionist Temporal Classifier (CTC) is trained with general speech data.
- the trained speech wakeup model can be used for speech wakeup and support user-defined keywords triggered by speech wakeup.
- the speech wakeup model can be used in low-power devices such as mobile phones and home appliances, because the DNN included in the speech wakeup model can be relatively not too complicated, and for example, can only have three or four layers with two or three hundred nodes in each layer.
- the speech wakeup model can be referred to as CTC-KWS.
- the DNN is a multi-layer perceptron, which has one or more hidden layers between an input layer and an output layer and can simulate complex nonlinear relationships.
- the CTC is a classifier configured to perform a label tagging task, and does not require forced alignment between input and output.
- FIG. 1 is a schematic diagram of an overall architecture 100 for speech wakeup, according to an embodiment.
- the overall architecture 100 includes speech data 102 as a first part and a speech wakeup model 104 as a second part.
- the speech wakeup model 104 includes a DNN 106 and a CTC 108 .
- Speech wakeup can be implemented by inputting the speech data 102 to the speech wakeup model 104 for processing.
- the embodiments described in detail in the following are based on the overall architecture 100 .
- FIG. 2 is a flowchart of a speech wakeup method 200 according to an embodiment.
- the method 200 may be performed by a server or a terminal, for example, through a model training program, a speech recognition program, a speech wakeup application, and so on the server or the terminal.
- the server or terminal may be a mobile phone, a tablet computer, a smart wearable device, an automobile machine, a personal computer, a medium-sized computer, a computer cluster, and so on.
- the method 200 may include the following steps.
- speech data is input to a speech wakeup model trained with general speech data.
- speech can be monitored by the server or the terminal to obtain the speech data.
- a user can speak out a predetermined keyword to trigger the speech wakeup model to execute speech wakeup.
- the speech wakeup model outputs a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- the general speech data described in step S 202 is less restricted and, thus, easily, accessible.
- it may be a Urge Vocabulary Continuous Speech Recognition (LVCSR) corpus or the like.
- the DNN included in the speech recognition model may predict a posterior probability distribution of a pronunciation phoneme sequence corresponding to input speech features.
- the DNN can be followed by the CTC to give a confidence score corresponding to the predicted pronunciation phoneme sequence.
- a result for determining whether to execute speech wakeup can be output based on the confidence score.
- the speech wakeup model may be trained with the accessible general speech data, and further the trained speech wakeup model can be used for speech wakeup, which is conducive to improving the accuracy of speech wakeup.
- the speech wakeup model also breaks through the restrictions of the keyword-specific speech data and supports user-defined triggered keywords. Therefore, it is more convenient and flexible in actual applications and conducive to improving user experience.
- FIG. 3 is a schematic diagram of a speech wakeup model 300 , according to an embodiment.
- the speech wakeup model 300 may include a feature extracting module 302 , a DNN 304 , and a CTC 306 in sequence.
- step S 204 of the method 200 FIG.
- the outputting, by the speech wakeup model, a result for determining whether to execute speech wakeup may include: the feature extracting module 302 extracting acoustic features from the input speech data; inputting the acoustic features to the DNN 304 for processing to obtain a class probability of the acoustic features respectively corresponding to each pronunciation phoneme; the DNN 304 inputting the class probability to the CTC 306 for processing to obtain a confidence score of a speech wakeup term corresponding to a pronunciation phoneme sequence; and the CTC 306 determining whether to execute wakeup according to the confidence score, and outputting a determination result.
- the speech wakeup model 300 is described in detail below in further combination with FIG. 4 , FIG. 5 , and FIG. 6 .
- FIG. 4 is a schematic diagram of feature extraction performed by a feature extracting module, such as the feature extracting module 302 ( FIG. 3 ), according to an embodiment.
- a target label sequence corresponding thereto is a pronunciation phoneme sequence, which can be expressed as: “zhi1ma2kai1men2”, wherein the numbers represent tones.
- tone phonemes are also taken into account as a modeling unit.
- context-independent or context-dependent phonemes can all be taken into account, among which the latter is more numerous. However, in consideration of reducing subsequent computational burden of the DNN, only the context-independent phonemes may be considered, such as 72 context-independent phoneme units in Chinese, including a blank unit.
- acoustic features can be extracted by the feature extracting module from the input speech data, which may include: extracting acoustic feature frames of the input speech data from a window according to a specified time interval, wherein each of the acoustic feature frames may be multi-dimension log filter bank energies; stacking a plurality of adjacent acoustic feature frames respectively; taking the stacked acoustic feature frames respectively as acoustic features extracted from the general speech data; and further, the stacked acoustic feature frames can be used as inputs of the DNN respectively.
- the log filter bank energies refer to energy signals extracted by a log filter bank, which can be expressed as a vector to facilitate model processing.
- the multi-dimension in the foregoing represents multiple dimensions of the vector.
- a specified length of a time window may be 25 milliseconds, each time window may move for 10 milliseconds, and the multi-dimension may be, for example, 40 dimensions.
- milliseconds from 0 to 25 may be used as a window, and 40-dimension log filter bank energies are correspondingly extracted from the speech data to serve as a first acoustic feature frame; milliseconds from 10 to 35 can be used as a window, and 40-dimension log filter bank energies are correspondingly extracted from the speech data to serve as a second acoustic feature frame; and multiple acoustic feature frames can be extracted in the same way.
- stacking a plurality of adjacent acoustic feature frames may allow more information from a context of a current frame, which is conducive to improving the accuracy of subsequent prediction results.
- the current frame, the adjacent consecutive ten frames before the current frame, and the adjacent consecutive five frames after the current frame can be, for example, stacked to obtain a 640-dimension stacking feature for being inputted to the subsequent DNN.
- cepstral mean and variance normalization can be carried out on the dimensions of the stacking feature, and then backward input can be carried out.
- FIG. 5 is a schematic diagram of a DNN, such as the DNN 304 ( FIG. 3 ), according to an embodiment.
- the DNN may include an input layer 502 , one or more hidden layers 504 , and an output layer 506 .
- FIG. 5 various neurons, represented by circles, in the DNN are fully connected.
- the acoustic features extracted by the feature extracting module 302 ( FIG. 3 ) are input to the DNN.
- the DNN can describe a relationship between an input acoustic feature x 0 ⁇ R n 0 in the input layer 502 and a modeling unit j in the output layer 506 according to the following function mapping:
- x i,i>0 ⁇ R n i is an output of a hidden layer
- W i ⁇ R n i ⁇ n i ⁇ 1 and B i ⁇ RR n i are weights and offset parameters respectively, which may be predetermined based on, training or an application need
- n i is the number of nodes on the i th layer
- ⁇ ⁇ W i , B i ⁇
- T denotes transpose of a matrix
- N is the number of the hidden layers
- the formula III is a softmax function in the embodiment, representing an estimated posterior probability of a label unit j.
- a Recurrent Neural Network can also be used in conjunction with the CTC.
- RNN Recurrent Neural Network
- the minimum computing and power consumption requirements of mobile devices can be more easily met by using the DNN in conjunction with the CTC.
- the DNN with about hundreds of nodes in a hidden layer is more suitable.
- FIG. 6 is a schematic diagram of a CTC, such as the CTC 306 ( FIG. 3 ), according to an embodiment.
- the CTC is configured for sequential label tasks. Unlike the cross-entropy criterion for the frame-level alignment between input features and target labels, the CTC is aimed to automatically learn the alignment between speech data and label sequences (e.g., phonemes or characters, etc.), thereby eliminating the need for forced alignment of data, and the input is not necessarily the same as the label length.
- speech data and label sequences e.g., phonemes or characters, etc.
- a specified modeling unit is extracted from L, and the CTC is located on a softmax layer of the DNN.
- the DNN is composed of an
- y j t (j ⁇ [0,
- An input sequence x T of a frame length T and a target label l ⁇ T are given, and l i ⁇ L.
- a CTC path ⁇ ( ⁇ 0 , . . . , ⁇ r-1 ) is a frame-level label sequence, which is different from l in that the CTC path allows the appearance of repeated non-blank labels and blank units.
- a many-to-one mapping function is defined as ⁇ , and “-” represents blank. If x T is given, and an output probability condition of each time step is assumed to be independent, the probability of the path ⁇ is:
- the probability of the label sequence l can be calculated based on ⁇ by summing the probabilities of all the paths mapped to l. In some embodiments, it may be troublesome to sum up all the paths on the CTC in terms of calculation. With respect to this problem, a forward and backward dynamic programming algorithm can be adopted. All possible CTC paths are represented compactly as grids based on the algorithm, as shown in FIG. 6 .
- a detection engine can make a positive decision accordingly, and it can be considered that corresponding keywords have been detected.
- the set threshold can be fine-tuned based on a verification data set.
- the model can be trained by a gradient descent method, such as an asynchronous random gradient descent method, to iteratively optimize parameters in the speech wakeup model until the training converges.
- a gradient descent method such as an asynchronous random gradient descent method
- the DNN and the CTC can be trained on a server having a Graphics Processing Unit (GPU).
- Network parameters are randomly initialized to be uniformly distributed within a range of ( ⁇ 0.02, 0.02), an initial learning rate is 0.008, and a momentum is 0.9.
- the learning rate is a parameter used in the gradient descent method.
- a solution may be initialized first, and on the basis of this solution, a moving direction and a moving step size are determined, so that after the initial solution moves according to the direction and the step size, the output of a target function can be reduced. Then it is updated to a new solution, a next moving direction and a next step size are searched continuously, and after this process is performed iteratively, the target function is constantly decreased, to finally find a solution, such that the target function is relatively small.
- the learning rate is used for the adjustment of the original step size. In the gradient descent method, the step size in each adjustment is equal to the learning rate multiplied by a gradient.
- a verification data set may also be used to cross-verify the speech wakeup model to determine whether the training converges.
- One measure is adaptive training. For example, a general model can be fine-tuned with speech data of some specific keywords and at a relatively low learning rate. Based on this consideration, when the speech wakeup model is trained, keyword-specific speech data can also be acquired, and the speech wakeup model can be trained with the keyword-specific speech data. A learning rate used in the training is less than that used in the training of the speech wakeup model with the general speech data.
- the network parameters may not be randomly initialized, but refer to an existing corresponding network which has the same topology structure as the target network except for fine-grained units in the output layer, and may use a cross entropy criterion.
- the transfer learning can be considered especially when the training data has a large scale.
- architecture-related vector instructions e.g., ARM's NEON
- NEON architecture-related vector instructions
- a target label sequence corresponding to such user-defined keywords can be determined through a dictionary.
- a speech wakeup method provided in the embodiment of this specification is as described in e foregoing.
- a corresponding apparatus is further provided in an embodiment of this specification, as shown in FIG. 7 .
- FIG. 7 is a schematic diagram of a speech wakeup apparatus 700 , according to an embodiment.
- the apparatus 700 corresponds to the method 200 ( FIG. 2 ), and the dashed box in FIG. 7 represents an optional module.
- the apparatus 700 may include an input module 701 and a speech wakeup model 702 .
- Speech data is input by the input module 701 to the speech wakeup model 702 trained with general speech data, and the speech wakeup model 702 outputs a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- the general speech data includes a LVCSR corpus.
- the apparatus further includes a training module 703 ; and training, by the training module 703 , the speech wakeup model with the general speech data includes: iteratively optimizing, by the training module 703 , parameters in the speech wakeup model with the general speech data by an asynchronous stochastic gradient descent method until the training converges.
- the training module 703 further acquires keyword-specific speech data; and trains the speech wakeup model with the keyword-specific speech data, wherein a learning rate used in the training is less than that used in the training of the speech wakeup model with the general speech data.
- the training module 703 cross-verifies the speech wakeup model with a verification data set in the training to determine whether the training converges.
- the outputting, by the speech wakeup model 702 , a result for determining whether to execute speech wakeup specifically includes: extracting, by the speech wakeup model 702 , acoustic features from the input speech data; inputting the acoustic features to the DNN included in the speech wakeup model 702 for processing to obtain a class probability of the acoustic features respectively corresponding to each pronunciation phoneme; inputting the class probability to the CTC included in the speech wakeup model 702 for processing to obtain a confidence score of a speech wakeup term corresponding to a pronunciation phoneme sequence; and determining whether to execute wakeup according to the confidence score, and outputting a determination result.
- the extracting, by the speech wakeup model 702 , acoustic features from the input speech data specifically includes: extracting, by the speech wakeup model 702 , acoustic feature frames of the input speech data from a window according to a specified time interval, wherein each of the acoustic feature frames is multi-dimension log filter bank energies; stacking a plurality of adjacent acoustic feature frames respectively; and taking the stacked acoustic feature frames respectively as acoustic features extracted from the monitored speech.
- a corresponding electronic device is further provided in an embodiment of this specification, as shown FIG. 8 .
- FIG. 8 is a schematic diagram of an electronic device 800 , according to an embodiment.
- the electronic device 800 includes at least one processor 802 ; and a memory 804 communicatively connected to the at least one processor 802 .
- the electronic device 800 may also include other hardware 806 , such as a network interface.
- the at least one processor 802 may include one or more dedicated processing units, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or various other types of processors or processing units.
- the at least one processor 802 is coupled with e memory 804 and is configured to execute instructions stored in the memory 804 to perform the above described methods.
- the memory 804 may include a non-permanent memory, a random access memory (RAM) and/or a non-volatile memory (such as a read-only memory (ROM) or a flash memory (flash RAM)), etc.
- the memory 804 stores an instruction executable by the at least one processor 802 , and the instruction is executed by the at least one processor 802 to cause the electronic device 800 to: input speech data to a speech wakeup model trained with general speech data, and output, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- a corresponding non-transitory computer storage medium with a computer executable instruction stored thereon is further provided in an embodiment of this specification.
- the computer executable instruction is configured to: input speech data to a speech wakeup model trained with general speech data, and output, by the speech wakeup model, a result for determining whether to execute speech wakeup, wherein the speech wakeup model includes a DNN and a CTC.
- the apparatus, the electronic device, and the computer storage medium provided in the embodiments of this specification are corresponding to the method. Therefore, the apparatus, the electronic device, and the non-volatile computer storage medium also have beneficial technical effects similar to those of the corresponding method. As the beneficial technical effects of the method have been described in detail in the foregoing, the beneficial technical effects of the apparatus, the electronic device, and the non-volatile computer storage medium will not be elaborated here.
- PLD Programmable Logic Device
- FPGA Field Programmable Gate Array
- the logic compiler software is similar to a software complier used for developing and writing a program, and original codes before compiling also need to be written by using a specific programming language, which is referred to as a Hardware Description Language (HDL).
- HDL Hardware Description Language
- ABEL Advanced Boolean Expression Language
- AHDL Altera Hardware Description Language
- CUPL Cornell University Programming Language
- HDCal Java Hardware Description Language
- JHDL Java Hardware Description Language
- Lava Lola
- MyHDL MyHDL
- PALASM Phase Change Language
- RHDL Ruby Hardware Description Language
- a controller can be implemented in any suitable manner in the above described devices.
- the controller can employ a form of a microprocessor or a processor and a computer-readable storage medium that stores computer-readable program codes (such as software or firmware) executable by the microprocessor or processor, a logic gate, a switch, an Application Specific integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, Examples of the controller include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320.
- the controller of the memory can further be implemented as a part of control logic of the memory.
- controller may be considered as a hardware component, and apparatuses included in the controller and configured to implement various functions may also be considered as structures inside the hardware component. Or, the apparatuses configured to implement various functions may even be considered as both software modules configured to implement the method and structures inside the hardware component.
- the apparatuses, modules or models illustrated in the foregoing embodiments can be implemented by a computer chip or an entity, or implemented by a product having a specific function.
- a typical implementation device is a computer.
- the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
- the apparatus is divided into various units based on functions, and the units are described separately.
- the functions of various units can also be implemented in one or more pieces of software and/or hardware.
- the above described embodiments can be provided as a method, a system, or a computer program product. Therefore, the embodiments may be implemented in a form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of this specification can be in the form of a computer program product implemented on one or more computer usable storage media (including, but not limited to, a magnetic disk memory, a CD-ROM, an optical memory and the like) including computer usable program codes.
- a computer usable storage media including, but not limited to, a magnetic disk memory, a CD-ROM, an optical memory and the like
- the computer program instructions may also be stored in a computer-readable memory that can guide the computer or another programmable data processing device to work in a specific manner, such that the instruction stored in the computer-readable memory generates an article of manufacture including an instruction apparatus, and the instruction apparatus implements functions designated by one or more processes in a flowchart and/or one or more blocks in a block diagram.
- the computer program instructions may also be loaded to the computer or another programmable data processing device, such that a series of operation steps are executed on the computer or another programmable device to generate a computer implemented processing, and therefore, the instruction executed in the computer or another programmable device provides steps for implementing functions designated in one or more processes in a flowchart and/or one or more blocks in a block diagram.
- the computer-readable storage medium includes non-volatile and volatile media as well as movable and non-movable media, and can implement information storage by means of any, method or technology.
- the information can be a computer-readable instruction, a data structure, and a module of a program or other data.
- Examples of the computer-readable storage medium include, but are not limited to, a phase change memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), other types of RAM, a ROM, an electrically erasable programmable read-only memory (EEPROM), a flash memory or other memory technologies, a compact disk read-only memory (CD-ROM), a digital versatile disc (DVD) or other optical storages, a cassette tape, a magnetic tape/magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, and can be used to store information accessible to the computing device.
- the computer-readable storage medium does not include transitory media, such as a modulated data signal and a carrier.
- the embodiments can be described in a general context of a computer executable instruction executed by a computer, for example, a program module.
- the program module includes a routine, a program, an object, an assembly, a data structure, and the like used for executing a specific task or implementing a specific abstract data type.
- the embodiments can also be implemented in distributed computing environments. In these distributed computing environments, a task is executed by using remote processing devices connected via a communications network. In the distributed computing environments, the program module may located in local and remote computer storage media including a storage device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
- Electric Clocks (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/774,422 US10748524B2 (en) | 2017-06-29 | 2020-01-28 | Speech wakeup method, apparatus, and electronic device |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710514348.6A CN107358951A (zh) | 2017-06-29 | 2017-06-29 | 一种语音唤醒方法、装置以及电子设备 |
CN201710514348.6 | 2017-06-29 | ||
PCT/CN2018/092899 WO2019001428A1 (zh) | 2017-06-29 | 2018-06-26 | 一种语音唤醒方法、装置以及电子设备 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/092899 Continuation WO2019001428A1 (zh) | 2017-06-29 | 2018-06-26 | 一种语音唤醒方法、装置以及电子设备 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/774,422 Continuation US10748524B2 (en) | 2017-06-29 | 2020-01-28 | Speech wakeup method, apparatus, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200013390A1 true US20200013390A1 (en) | 2020-01-09 |
Family
ID=60274110
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/571,468 Abandoned US20200013390A1 (en) | 2017-06-29 | 2019-09-16 | Speech wakeup method, apparatus, and electronic device |
US16/774,422 Active US10748524B2 (en) | 2017-06-29 | 2020-01-28 | Speech wakeup method, apparatus, and electronic device |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/774,422 Active US10748524B2 (en) | 2017-06-29 | 2020-01-28 | Speech wakeup method, apparatus, and electronic device |
Country Status (11)
Country | Link |
---|---|
US (2) | US20200013390A1 (zh) |
EP (1) | EP3579227B1 (zh) |
JP (1) | JP6877558B2 (zh) |
KR (1) | KR102181836B1 (zh) |
CN (1) | CN107358951A (zh) |
ES (1) | ES2878137T3 (zh) |
PH (1) | PH12019501674A1 (zh) |
PL (1) | PL3579227T3 (zh) |
SG (1) | SG11201906576WA (zh) |
TW (1) | TWI692751B (zh) |
WO (1) | WO2019001428A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111883121A (zh) * | 2020-07-20 | 2020-11-03 | 北京声智科技有限公司 | 唤醒方法、装置及电子设备 |
CN112233655A (zh) * | 2020-09-28 | 2021-01-15 | 上海声瀚信息科技有限公司 | 一种提高语音命令词识别性能的神经网络训练方法 |
CN112669818A (zh) * | 2020-12-08 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | 语音唤醒方法及装置、可读存储介质、电子设备 |
CN113113007A (zh) * | 2021-03-30 | 2021-07-13 | 北京金山云网络技术有限公司 | 语音数据的处理方法和装置、电子设备和存储介质 |
US11081102B2 (en) * | 2019-08-16 | 2021-08-03 | Ponddy Education Inc. | Systems and methods for comprehensive Chinese speech scoring and diagnosis |
US20220293088A1 (en) * | 2021-03-12 | 2022-09-15 | Samsung Electronics Co., Ltd. | Method of generating a trigger word detection model, and an apparatus for the same |
US11587550B2 (en) * | 2020-06-10 | 2023-02-21 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Method and apparatus for outputting information |
WO2023085699A1 (ko) * | 2021-11-10 | 2023-05-19 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
US11967322B2 (en) | 2021-05-06 | 2024-04-23 | Samsung Electronics Co., Ltd. | Server for identifying false wakeup and method for controlling the same |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358951A (zh) * | 2017-06-29 | 2017-11-17 | 阿里巴巴集团控股有限公司 | 一种语音唤醒方法、装置以及电子设备 |
CN108320733B (zh) * | 2017-12-18 | 2022-01-04 | 上海科大讯飞信息科技有限公司 | 语音数据处理方法及装置、存储介质、电子设备 |
CN108182937B (zh) * | 2018-01-17 | 2021-04-13 | 出门问问创新科技有限公司 | 关键词识别方法、装置、设备及存储介质 |
US11488002B2 (en) * | 2018-02-15 | 2022-11-01 | Atlazo, Inc. | Binary neural network accelerator engine methods and systems |
CN108597523B (zh) * | 2018-03-23 | 2019-05-17 | 平安科技(深圳)有限公司 | 说话人认证方法、服务器及计算机可读存储介质 |
WO2019222996A1 (en) * | 2018-05-25 | 2019-11-28 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for voice recognition |
CN110619871B (zh) * | 2018-06-20 | 2023-06-30 | 阿里巴巴集团控股有限公司 | 语音唤醒检测方法、装置、设备以及存储介质 |
US11257481B2 (en) * | 2018-10-24 | 2022-02-22 | Tencent America LLC | Multi-task training architecture and strategy for attention-based speech recognition system |
CN111276138B (zh) * | 2018-12-05 | 2023-07-18 | 北京嘀嘀无限科技发展有限公司 | 一种语音唤醒系统中处理语音信号的方法及装置 |
CN109886386B (zh) * | 2019-01-30 | 2020-10-27 | 北京声智科技有限公司 | 唤醒模型的确定方法及装置 |
CN109872713A (zh) * | 2019-03-05 | 2019-06-11 | 深圳市友杰智新科技有限公司 | 一种语音唤醒方法及装置 |
CN110310628B (zh) | 2019-06-27 | 2022-05-20 | 百度在线网络技术(北京)有限公司 | 唤醒模型的优化方法、装置、设备及存储介质 |
JP7098587B2 (ja) * | 2019-08-29 | 2022-07-11 | 株式会社東芝 | 情報処理装置、キーワード検出装置、情報処理方法およびプログラム |
CN110634468B (zh) * | 2019-09-11 | 2022-04-15 | 中国联合网络通信集团有限公司 | 语音唤醒方法、装置、设备及计算机可读存储介质 |
CN110648668A (zh) * | 2019-09-24 | 2020-01-03 | 上海依图信息技术有限公司 | 关键词检测装置和方法 |
CN110648659B (zh) * | 2019-09-24 | 2022-07-01 | 上海依图信息技术有限公司 | 基于多任务模型的语音识别与关键词检测装置和方法 |
CN110970016B (zh) * | 2019-10-28 | 2022-08-19 | 苏宁云计算有限公司 | 一种唤醒模型生成方法、智能终端唤醒方法及装置 |
CN110853629A (zh) * | 2019-11-21 | 2020-02-28 | 中科智云科技有限公司 | 一种基于深度学习的语音识别数字的方法 |
CN110992929A (zh) * | 2019-11-26 | 2020-04-10 | 苏宁云计算有限公司 | 一种基于神经网络的语音关键词检测方法、装置及系统 |
US11341954B2 (en) * | 2019-12-17 | 2022-05-24 | Google Llc | Training keyword spotters |
JP7438744B2 (ja) * | 2019-12-18 | 2024-02-27 | 株式会社東芝 | 情報処理装置、情報処理方法、およびプログラム |
CN112733272A (zh) * | 2021-01-13 | 2021-04-30 | 南昌航空大学 | 一种解决带软时间窗的车辆路径问题的方法 |
KR102599480B1 (ko) * | 2021-05-18 | 2023-11-08 | 부산대학교 산학협력단 | 키워드 음성인식을 위한 자동 학습 시스템 및 방법 |
CN113160823B (zh) * | 2021-05-26 | 2024-05-17 | 中国工商银行股份有限公司 | 基于脉冲神经网络的语音唤醒方法、装置及电子设备 |
CN113990296B (zh) * | 2021-12-24 | 2022-05-27 | 深圳市友杰智新科技有限公司 | 语音声学模型的训练方法、后处理方法和相关设备 |
US20240119925A1 (en) * | 2022-10-10 | 2024-04-11 | Samsung Electronics Co., Ltd. | System and method for post-asr false wake-up suppression |
CN115862604B (zh) * | 2022-11-24 | 2024-02-20 | 镁佳(北京)科技有限公司 | 语音唤醒模型训练及语音唤醒方法、装置及计算机设备 |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05128286A (ja) * | 1991-11-05 | 1993-05-25 | Ricoh Co Ltd | ニユーラルネツトワークによるキーワードスポツテイング方式 |
JP2007179239A (ja) * | 2005-12-27 | 2007-07-12 | Kenwood Corp | スケジュール管理装置及びプログラム |
US9117449B2 (en) * | 2012-04-26 | 2015-08-25 | Nuance Communications, Inc. | Embedded system for construction of small footprint speech recognition with user-definable constraints |
US9177547B2 (en) * | 2013-06-25 | 2015-11-03 | The Johns Hopkins University | System and method for processing speech to identify keywords or other information |
CN104378723A (zh) * | 2013-08-16 | 2015-02-25 | 上海耐普微电子有限公司 | 具有语音唤醒功能的麦克风 |
US9715660B2 (en) * | 2013-11-04 | 2017-07-25 | Google Inc. | Transfer learning for deep neural network based hotword detection |
US9443522B2 (en) * | 2013-11-18 | 2016-09-13 | Beijing Lenovo Software Ltd. | Voice recognition method, voice controlling method, information processing method, and electronic apparatus |
CN105096935B (zh) * | 2014-05-06 | 2019-08-09 | 阿里巴巴集团控股有限公司 | 一种语音输入方法、装置和系统 |
US10783900B2 (en) * | 2014-10-03 | 2020-09-22 | Google Llc | Convolutional, long short-term memory, fully connected deep neural networks |
JP6564058B2 (ja) * | 2015-04-10 | 2019-08-21 | 華為技術有限公司Huawei Technologies Co.,Ltd. | 音声認識方法、音声ウェイクアップ装置、音声認識装置、および端末 |
CN106297774B (zh) * | 2015-05-29 | 2019-07-09 | 中国科学院声学研究所 | 一种神经网络声学模型的分布式并行训练方法及系统 |
TWI639153B (zh) * | 2015-11-03 | 2018-10-21 | 絡達科技股份有限公司 | 電子裝置及其透過語音辨識喚醒的方法 |
JP6679898B2 (ja) * | 2015-11-24 | 2020-04-15 | 富士通株式会社 | キーワード検出装置、キーワード検出方法及びキーワード検出用コンピュータプログラム |
US10755698B2 (en) * | 2015-12-07 | 2020-08-25 | University Of Florida Research Foundation, Inc. | Pulse-based automatic speech recognition |
CN106887227A (zh) * | 2015-12-16 | 2017-06-23 | 芋头科技(杭州)有限公司 | 一种语音唤醒方法及系统 |
CN105632486B (zh) * | 2015-12-23 | 2019-12-17 | 北京奇虎科技有限公司 | 一种智能硬件的语音唤醒方法和装置 |
US10229672B1 (en) * | 2015-12-31 | 2019-03-12 | Google Llc | Training acoustic models using connectionist temporal classification |
CN105931633A (zh) * | 2016-05-30 | 2016-09-07 | 深圳市鼎盛智能科技有限公司 | 语音识别的方法及系统 |
CN106098059B (zh) * | 2016-06-23 | 2019-06-18 | 上海交通大学 | 可定制语音唤醒方法及系统 |
CN106611597B (zh) * | 2016-12-02 | 2019-11-08 | 百度在线网络技术(北京)有限公司 | 基于人工智能的语音唤醒方法和装置 |
CN106782536B (zh) * | 2016-12-26 | 2020-02-28 | 北京云知声信息技术有限公司 | 一种语音唤醒方法及装置 |
CN107221326B (zh) * | 2017-05-16 | 2021-05-28 | 百度在线网络技术(北京)有限公司 | 基于人工智能的语音唤醒方法、装置和计算机设备 |
CN107358951A (zh) * | 2017-06-29 | 2017-11-17 | 阿里巴巴集团控股有限公司 | 一种语音唤醒方法、装置以及电子设备 |
-
2017
- 2017-06-29 CN CN201710514348.6A patent/CN107358951A/zh active Pending
-
2018
- 2018-03-14 TW TW107108572A patent/TWI692751B/zh active
- 2018-06-26 ES ES18823086T patent/ES2878137T3/es active Active
- 2018-06-26 WO PCT/CN2018/092899 patent/WO2019001428A1/zh unknown
- 2018-06-26 PL PL18823086T patent/PL3579227T3/pl unknown
- 2018-06-26 JP JP2019539235A patent/JP6877558B2/ja active Active
- 2018-06-26 KR KR1020197022130A patent/KR102181836B1/ko active IP Right Grant
- 2018-06-26 SG SG11201906576WA patent/SG11201906576WA/en unknown
- 2018-06-26 EP EP18823086.6A patent/EP3579227B1/en active Active
-
2019
- 2019-07-19 PH PH12019501674A patent/PH12019501674A1/en unknown
- 2019-09-16 US US16/571,468 patent/US20200013390A1/en not_active Abandoned
-
2020
- 2020-01-28 US US16/774,422 patent/US10748524B2/en active Active
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11081102B2 (en) * | 2019-08-16 | 2021-08-03 | Ponddy Education Inc. | Systems and methods for comprehensive Chinese speech scoring and diagnosis |
US11587550B2 (en) * | 2020-06-10 | 2023-02-21 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Method and apparatus for outputting information |
CN111883121A (zh) * | 2020-07-20 | 2020-11-03 | 北京声智科技有限公司 | 唤醒方法、装置及电子设备 |
CN112233655A (zh) * | 2020-09-28 | 2021-01-15 | 上海声瀚信息科技有限公司 | 一种提高语音命令词识别性能的神经网络训练方法 |
CN112669818A (zh) * | 2020-12-08 | 2021-04-16 | 北京地平线机器人技术研发有限公司 | 语音唤醒方法及装置、可读存储介质、电子设备 |
US20220293088A1 (en) * | 2021-03-12 | 2022-09-15 | Samsung Electronics Co., Ltd. | Method of generating a trigger word detection model, and an apparatus for the same |
CN113113007A (zh) * | 2021-03-30 | 2021-07-13 | 北京金山云网络技术有限公司 | 语音数据的处理方法和装置、电子设备和存储介质 |
US11967322B2 (en) | 2021-05-06 | 2024-04-23 | Samsung Electronics Co., Ltd. | Server for identifying false wakeup and method for controlling the same |
WO2023085699A1 (ko) * | 2021-11-10 | 2023-05-19 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
Also Published As
Publication number | Publication date |
---|---|
CN107358951A (zh) | 2017-11-17 |
PL3579227T3 (pl) | 2021-10-18 |
EP3579227A4 (en) | 2020-02-26 |
TWI692751B (zh) | 2020-05-01 |
JP6877558B2 (ja) | 2021-05-26 |
US20200168207A1 (en) | 2020-05-28 |
PH12019501674A1 (en) | 2020-06-01 |
JP2020517977A (ja) | 2020-06-18 |
EP3579227A1 (en) | 2019-12-11 |
TW201905897A (zh) | 2019-02-01 |
US10748524B2 (en) | 2020-08-18 |
WO2019001428A1 (zh) | 2019-01-03 |
KR102181836B1 (ko) | 2020-11-25 |
KR20190134594A (ko) | 2019-12-04 |
SG11201906576WA (en) | 2019-08-27 |
EP3579227B1 (en) | 2021-06-09 |
ES2878137T3 (es) | 2021-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10748524B2 (en) | Speech wakeup method, apparatus, and electronic device | |
Ravanelli et al. | Light gated recurrent units for speech recognition | |
US11423233B2 (en) | On-device projection neural networks for natural language understanding | |
EP3770905B1 (en) | Speech recognition method, apparatus and device, and storage medium | |
US20230410796A1 (en) | Encoder-decoder models for sequence to sequence mapping | |
US11501154B2 (en) | Sensor transformation attention network (STAN) model | |
US11158305B2 (en) | Online verification of custom wake word | |
Lee et al. | High-level feature representation using recurrent neural network for speech emotion recognition | |
KR102323046B1 (ko) | 음성 감정 검출 방법 및 장치, 컴퓨터 장치 및 저장 매체 | |
KR102167719B1 (ko) | 언어 모델 학습 방법 및 장치, 음성 인식 방법 및 장치 | |
US9728183B2 (en) | System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification | |
Dahl et al. | Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition | |
US10777188B2 (en) | Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing | |
US9235799B2 (en) | Discriminative pretraining of deep neural networks | |
CN108885870A (zh) | 用于通过将言语到文本系统与言语到意图系统组合来实现声音用户接口的系统和方法 | |
US11423884B2 (en) | Device with convolutional neural network for acquiring multiple intent words, and method thereof | |
CN111833866A (zh) | 用于低资源设备的高准确度关键短语检测的方法和系统 | |
KR20220130565A (ko) | 키워드 검출 방법 및 장치 | |
JP7178394B2 (ja) | 音声信号を処理するための方法、装置、機器、および媒体 | |
US20230031733A1 (en) | Method for training a speech recognition model and method for speech recognition | |
Zhang et al. | Wake-up-word spotting using end-to-end deep neural network system | |
KR20200120595A (ko) | 언어 모델 학습 방법 및 장치, 음성 인식 방법 및 장치 | |
US20230045790A1 (en) | Sensor transformation attention network (stan) model | |
Bovbjerg | Self-supervised Keyword Spotting | |
CN114627860A (zh) | 模型训练方法、语音处理方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ALIBABA GROUP HOLDING LIMITED, CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, ZHIMING;ZHOU, JUN;LI, XIAOLONG;SIGNING DATES FROM 20190806 TO 20190808;REEL/FRAME:050382/0704 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: ADVANTAGEOUS NEW TECHNOLOGIES CO., LTD., CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ALIBABA GROUP HOLDING LIMITED;REEL/FRAME:053713/0665 Effective date: 20200826 |
|
AS | Assignment |
Owner name: ADVANCED NEW TECHNOLOGIES CO., LTD., CAYMAN ISLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ADVANTAGEOUS NEW TECHNOLOGIES CO., LTD.;REEL/FRAME:053761/0338 Effective date: 20200910 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |