CN110600008A - Voice wake-up optimization method and system - Google Patents

Voice wake-up optimization method and system Download PDF

Info

Publication number
CN110600008A
CN110600008A CN201910899791.9A CN201910899791A CN110600008A CN 110600008 A CN110600008 A CN 110600008A CN 201910899791 A CN201910899791 A CN 201910899791A CN 110600008 A CN110600008 A CN 110600008A
Authority
CN
China
Prior art keywords
acoustic model
awakening
voice
phoneme
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910899791.9A
Other languages
Chinese (zh)
Inventor
徐俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201910899791.9A priority Critical patent/CN110600008A/en
Publication of CN110600008A publication Critical patent/CN110600008A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a voice awakening optimization method. The method comprises the following steps: constructing a secondary awakening acoustic model, wherein the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model; extracting the characteristics of the received voice audio, inputting the mentioned acoustic characteristics into a phoneme-level acoustic model in a secondary awakening acoustic model, and extracting the output characteristics of the phoneme-level acoustic model; determining the confidence coefficient of the awakening word based on the output characteristics of the phoneme-level acoustic model as the input of the word-level acoustic model in the secondary awakening acoustic model; and when the confidence coefficient exceeds a preset awakening threshold value, determining the voice audio frequency as an awakening word, and performing voice awakening. The embodiment of the invention also provides an optimization system for voice awakening. The embodiment of the invention directly reduces the dependence of the final classification effect on the accuracy of the phoneme modeling unit, and can still correctly judge the awakening words under the condition of inaccurate phoneme classification.

Description

Voice wake-up optimization method and system
Technical Field
The invention relates to the field of intelligent voice conversation, in particular to a voice awakening optimization method and system.
Background
Voice wake-up typically utilizes deep neural networks to acoustically model the underlying acoustic units, which typically select phonemes.
In the above-described voice wake-up technology, the modeling unit is a phoneme, and the phoneme is firstly predicted, classified and processed; then calculating the similarity between the processed sequence and the awakening word sequence, and if the similarity is greater than a certain threshold value, awakening; otherwise, the device does not wake up.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
this technique relies heavily on the accuracy of the acoustic model to classify the speech signal on the modeling unit. Under the condition of low signal-to-noise ratio, the acoustic model has low accuracy in classifying phonemes, so that the awakening rate of scenes with low signal-to-noise ratio is influenced.
Disclosure of Invention
The method aims to at least solve the problem of low wake-up rate in the low signal-to-noise ratio scene in the prior art.
In a first aspect, an embodiment of the present invention provides a voice wakeup optimization method, including:
constructing a secondary awakening acoustic model, wherein the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
extracting the characteristics of the received voice audio, inputting the mentioned acoustic characteristics into a phoneme acoustic model in the secondary awakening acoustic model, and extracting the output characteristics of the phoneme acoustic model;
determining the confidence coefficient of the awakening word based on the output characteristics of the phoneme acoustic model and used as the input of the word-level acoustic model in the secondary awakening acoustic model;
and when the confidence coefficient exceeds a preset awakening threshold value, determining the voice audio frequency as an awakening word, and performing voice awakening.
In a second aspect, an embodiment of the present invention provides a voice wake-up optimization system, including:
the model building program module is used for building a secondary awakening acoustic model, and the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
the feature extraction program module is used for performing feature extraction on the received voice audio, inputting the mentioned acoustic features into a phoneme acoustic model in the secondary awakening acoustic model, and extracting output features of the phoneme acoustic model;
a confidence level determining program module, configured to determine a confidence level of a wakeup word based on output features of the phoneme acoustic model as input of a word-level acoustic model in the secondary wakeup acoustic model;
and the awakening program module is used for determining the voice audio as an awakening word and carrying out voice awakening when the confidence coefficient exceeds a preset awakening threshold value.
In a third aspect, an electronic device is provided, comprising: the device comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the voice wake-up optimization method of any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the voice wake-up optimization method according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: on the basis of one acoustic model, the deep acoustic features extracted by a certain length of voice signals are input into another classification model for direct classification, so that the dependence of the final classification effect on the accuracy of a phoneme modeling unit is directly reduced, and the awakening words can still be correctly distinguished under the condition of inaccurate phoneme classification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an optimized voice wakeup method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a voice wake-up optimization system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an optimization method for voice wakeup according to an embodiment of the present invention, including the following steps:
s11: constructing a secondary awakening acoustic model, wherein the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
s12: extracting the characteristics of the received voice audio, inputting the mentioned acoustic characteristics into a phoneme acoustic model in the secondary awakening acoustic model, and extracting the output characteristics of the phoneme acoustic model;
s13: determining the confidence coefficient of the awakening word based on the output characteristics of the phoneme acoustic model and used as the input of the word-level acoustic model in the secondary awakening acoustic model;
s14: and when the confidence coefficient exceeds a preset awakening threshold value, determining the voice audio frequency as an awakening word, and performing voice awakening.
In the present embodiment, unlike one acoustic model that is modeled, and unlike the comparison of the results of two general acoustic models, the comparison is not performed using the output results of two models, because the classification accuracy of phonemes is not significantly improved by selecting a plurality of acoustic models when the signal-to-noise ratio is low.
For step S11, instead of one modeled phoneme acoustic model, on this basis a secondary wake-up acoustic model is constructed, which comprises a phoneme acoustic model and a word-level acoustic model, wherein the task of the acoustic model is to calculate P (O | W), i.e. the probability of generating a speech waveform to the model. The acoustic model is an important component of the speech recognition system, and occupies most of the computational overhead of speech recognition, determining the performance of the speech recognition system. Conventional speech recognition systems commonly employ acoustic models based on GMM-HMM, where GMM is used to model the distribution of speech acoustic features and HMM is used to model the timing of speech signals. After the rise of deep learning in 2006, Deep Neural Networks (DNNs) were applied to the speech acoustic models. The phoneme acoustic model determines the probability of each phoneme in the speech waveform, and the word-level acoustic model determines the probability of each word in the speech waveform.
For step S12, in order to receive the real-time voice wake-up, the smart device is required to collect the voice audio in the environment in real time, perform feature extraction on the collected received voice audio, input the extracted acoustic features into the phoneme acoustic model in the secondary wake-up acoustic model, and extract the output features of the phoneme acoustic model, such as the phoneme sequence of the voice audio.
For step S13, based on the output characteristics of the acoustic model of phonemes, for example, the audio sequence output in step S12, as the input of the word-level acoustic model in the secondary wake acoustic model, the acoustic model is classified by another acoustic model, so that there is an explicit classification, and thus the confidence that the user audio is a wake word is more accurately determined.
For step S14, when the confidence exceeds a preset wake threshold, determining the voice audio as a wake word, and performing voice wake. .
According to the embodiment, on the basis of one acoustic model, the deep acoustic features extracted by the voice signals with a certain length are input into the other classification model to be directly classified, so that the dependence of the final classification effect on the accuracy of the phoneme modeling unit is directly reduced, and the awakening words can still be correctly distinguished under the condition of inaccurate phoneme classification.
As an implementation manner, in this embodiment, one of the two-stage wake-up acoustic models is a phoneme acoustic model, and the other acoustic model is a word-level acoustic model.
In this embodiment, one of the acoustic models is a phoneme acoustic model, and the other acoustic model is a word-level acoustic model. Repeated experiments show that the awakening word recognition is carried out by utilizing the phoneme acoustic model, the awakening performance is low under the condition of low signal-to-noise ratio, and the recognition performance depends heavily on the accuracy of the phoneme acoustic model to phoneme classification. On the basis of the phoneme acoustic model, a word-level acoustic model is connected to directly classify the awakening words, so that the identification effect of the awakening words can be improved through direct classification even under the condition of inaccurate phoneme classification, and the defect of the single phoneme acoustic model is overcome.
As an implementation manner, in this embodiment, after the extracting the output feature of the phoneme acoustic model, the method further includes:
sending the output characteristics of each frame to a characteristic accumulator;
when the frame number accumulation of the voice audio in the feature accumulator reaches a preset threshold value, splicing the output features in the feature accumulator into one-dimensional features;
and inputting the one-dimensional features into the word-level acoustic model to complete the coupling of the two models.
In the present embodiment, after extracting the features of the phoneme acoustic model output, the output features of each frame are sent to a feature totalizer for accumulation. When a certain number of frames are accumulated, the features are spliced into one-dimensional complete features, and the one-dimensional features are input into the word-level acoustic model, so that the two acoustic models can be coupled to ensure the use of the two models.
As an implementation manner, in this embodiment, before performing feature extraction on the received speech audio, the method further includes:
receiving an audio signal in real time according to an acoustic sensor, and determining whether the audio signal is a voice audio through a voice endpoint detection model;
and when the audio signal is voice audio, performing acoustic feature extraction on the received dialogue voice.
Since voice wakeup requires real-time detection of received audio, it is very resource-consuming to detect if audio is received. Before the characteristics of the received voice audio are extracted, the voice audio signals are received in real time according to an acoustic sensor in the intelligent equipment, whether the voice audio signals are the voice audio of the user is detected, the user can speak, then the detection is carried out, the voice awakening detection is avoided when the voice audio signals are received, and the voice awakening detection efficiency is improved.
Fig. 2 is a schematic structural diagram of a voice wakeup optimization system according to an embodiment of the present invention, which can execute the voice wakeup optimization method according to any of the above embodiments and is configured in a terminal.
The voice wake-up optimization system provided by this embodiment includes: a model building program module 11, a feature extraction program module 12, a confidence determination program module 13 and a wake-up program module 14.
The model building program module 11 is configured to build a secondary awakening acoustic model, where the secondary awakening acoustic model includes a phoneme acoustic model and a word-level acoustic model; the feature extraction program module 12 is configured to perform feature extraction on the received voice audio, input the mentioned acoustic features into a phoneme acoustic model in the secondary awakening acoustic model, and extract output features of the phoneme acoustic model; the confidence level determining program module 13 is configured to determine a confidence level of the awakening word based on the output features of the phoneme acoustic model as the input of the word-level acoustic model in the secondary awakening acoustic model; the awakening program module 14 is configured to determine the voice audio as an awakening word to perform voice awakening when the confidence level exceeds a preset awakening threshold.
Further, the one of the acoustic models is a phoneme acoustic model, and the one of the acoustic models is a word-level acoustic model.
Further, after the feature extraction program module, the system further comprises: a feature accumulation program module to:
sending the output characteristics of each frame to a characteristic accumulator;
when the frame number accumulation of the voice audio in the feature accumulator reaches a preset threshold value, splicing the output features in the feature accumulator into one-dimensional features;
inputting the one-dimensional features to the other acoustic model to complete the coupling of the two models.
Further, the feature extraction program module is further configured to:
receiving an audio signal in real time according to an acoustic sensor, and determining whether the audio signal is a voice audio through a voice endpoint detection model;
and when the audio signal is voice audio, performing acoustic feature extraction on the received dialogue voice.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the voice awakening optimization method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
constructing a secondary awakening acoustic model, wherein the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
extracting the characteristics of the received voice audio, inputting the mentioned acoustic characteristics into a phoneme acoustic model in the secondary awakening acoustic model, and extracting the output characteristics of the phoneme acoustic model;
determining the confidence coefficient of the awakening word based on the output characteristics of the phoneme acoustic model and used as the input of the word-level acoustic model in the secondary awakening acoustic model;
and when the confidence coefficient exceeds a preset awakening threshold value, determining the voice audio frequency as an awakening word, and performing voice awakening.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform the voice wake optimization method of any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the device comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the voice wake-up optimization method of any embodiment of the invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. An optimization method for voice wakeup includes:
constructing a secondary awakening acoustic model, wherein the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
extracting the characteristics of the received voice audio, inputting the mentioned acoustic characteristics into a phoneme acoustic model in the secondary awakening acoustic model, and extracting the output characteristics of the phoneme acoustic model;
determining the confidence coefficient of the awakening word based on the output characteristics of the phoneme acoustic model and used as the input of the word-level acoustic model in the secondary awakening acoustic model;
and when the confidence coefficient exceeds a preset awakening threshold value, determining the voice audio frequency as an awakening word, and performing voice awakening.
2. The method of claim 1, wherein after said extracting output features of said phoneme acoustic model, said method further comprises:
sending the output characteristics of each frame to a characteristic accumulator;
when the frame number accumulation of the voice audio in the feature accumulator reaches a preset threshold value, splicing the output features in the feature accumulator into one-dimensional features;
and inputting the one-dimensional features into the word-level acoustic model to complete the coupling of the two models.
3. The method of claim 1, wherein prior to said feature extracting the received speech audio, the method further comprises:
receiving an audio signal in real time according to an acoustic sensor, and determining whether the audio signal is a voice audio through a voice endpoint detection model;
and when the audio signal is voice audio, performing acoustic feature extraction on the received dialogue voice.
4. A voice wake-up optimization system, comprising:
the model building program module is used for building a secondary awakening acoustic model, and the secondary awakening acoustic model comprises a phoneme acoustic model and a word-level acoustic model;
the feature extraction program module is used for performing feature extraction on the received voice audio, inputting the mentioned acoustic features into a phoneme acoustic model in the secondary awakening acoustic model, and extracting output features of the phoneme acoustic model;
a confidence level determining program module, configured to determine a confidence level of a wakeup word based on output features of the phoneme acoustic model as input of a word-level acoustic model in the secondary wakeup acoustic model;
and the awakening program module is used for determining the voice audio as an awakening word and carrying out voice awakening when the confidence coefficient exceeds a preset awakening threshold value.
5. The system of claim 4, wherein after the feature extraction program module, the system further comprises: a feature accumulation program module to:
sending the output characteristics of each frame to a characteristic accumulator;
when the frame number accumulation of the voice audio in the feature accumulator reaches a preset threshold value, splicing the output features in the feature accumulator into one-dimensional features;
inputting the one-dimensional features to the other acoustic model to complete the coupling of the two models.
6. The system of claim 4, wherein the feature extraction program module is further to:
receiving an audio signal in real time according to an acoustic sensor, and determining whether the audio signal is a voice audio through a voice endpoint detection model;
and when the audio signal is voice audio, performing acoustic feature extraction on the received dialogue voice.
7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-3.
8. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3.
CN201910899791.9A 2019-09-23 2019-09-23 Voice wake-up optimization method and system Pending CN110600008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910899791.9A CN110600008A (en) 2019-09-23 2019-09-23 Voice wake-up optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910899791.9A CN110600008A (en) 2019-09-23 2019-09-23 Voice wake-up optimization method and system

Publications (1)

Publication Number Publication Date
CN110600008A true CN110600008A (en) 2019-12-20

Family

ID=68862451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910899791.9A Pending CN110600008A (en) 2019-09-23 2019-09-23 Voice wake-up optimization method and system

Country Status (1)

Country Link
CN (1) CN110600008A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161714A (en) * 2019-12-25 2020-05-15 联想(北京)有限公司 Voice information processing method, electronic equipment and storage medium
CN111429901A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 IoT chip-oriented multi-stage voice intelligent awakening method and system
CN113129873A (en) * 2021-04-27 2021-07-16 思必驰科技股份有限公司 Optimization method and system for stack type one-dimensional convolution network awakening acoustic model
CN113241059A (en) * 2021-04-27 2021-08-10 标贝(北京)科技有限公司 Voice wake-up method, device, equipment and storage medium
CN113450771A (en) * 2021-07-15 2021-09-28 维沃移动通信有限公司 Awakening method, model training method and device
CN113590207A (en) * 2021-07-30 2021-11-02 思必驰科技股份有限公司 Method and device for improving awakening effect
CN113707132A (en) * 2021-09-08 2021-11-26 北京声智科技有限公司 Awakening method and electronic equipment
CN115862604A (en) * 2022-11-24 2023-03-28 镁佳(北京)科技有限公司 Voice wakeup model training and voice wakeup method, device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999161A (en) * 2012-11-13 2013-03-27 安徽科大讯飞信息科技股份有限公司 Implementation method and application of voice awakening module
CN103632667A (en) * 2013-11-25 2014-03-12 华为技术有限公司 Acoustic model optimization method and device, voice awakening method and device, as well as terminal
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
CN108198548A (en) * 2018-01-25 2018-06-22 苏州奇梦者网络科技有限公司 A kind of voice awakening method and its system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999161A (en) * 2012-11-13 2013-03-27 安徽科大讯飞信息科技股份有限公司 Implementation method and application of voice awakening module
CN103632667A (en) * 2013-11-25 2014-03-12 华为技术有限公司 Acoustic model optimization method and device, voice awakening method and device, as well as terminal
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
CN108198548A (en) * 2018-01-25 2018-06-22 苏州奇梦者网络科技有限公司 A kind of voice awakening method and its system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161714A (en) * 2019-12-25 2020-05-15 联想(北京)有限公司 Voice information processing method, electronic equipment and storage medium
CN111429901A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 IoT chip-oriented multi-stage voice intelligent awakening method and system
CN113129873A (en) * 2021-04-27 2021-07-16 思必驰科技股份有限公司 Optimization method and system for stack type one-dimensional convolution network awakening acoustic model
CN113241059A (en) * 2021-04-27 2021-08-10 标贝(北京)科技有限公司 Voice wake-up method, device, equipment and storage medium
CN113450771A (en) * 2021-07-15 2021-09-28 维沃移动通信有限公司 Awakening method, model training method and device
CN113590207A (en) * 2021-07-30 2021-11-02 思必驰科技股份有限公司 Method and device for improving awakening effect
CN113707132A (en) * 2021-09-08 2021-11-26 北京声智科技有限公司 Awakening method and electronic equipment
CN113707132B (en) * 2021-09-08 2024-03-01 北京声智科技有限公司 Awakening method and electronic equipment
CN115862604A (en) * 2022-11-24 2023-03-28 镁佳(北京)科技有限公司 Voice wakeup model training and voice wakeup method, device and computer equipment
CN115862604B (en) * 2022-11-24 2024-02-20 镁佳(北京)科技有限公司 Voice awakening model training and voice awakening method and device and computer equipment

Similar Documents

Publication Publication Date Title
CN110600008A (en) Voice wake-up optimization method and system
CN110136749B (en) Method and device for detecting end-to-end voice endpoint related to speaker
US9966077B2 (en) Speech recognition device and method
CN108694940B (en) Voice recognition method and device and electronic equipment
CN110610707B (en) Voice keyword recognition method and device, electronic equipment and storage medium
EP2700071B1 (en) Speech recognition using multiple language models
US9799325B1 (en) Methods and systems for identifying keywords in speech signal
CN109036471B (en) Voice endpoint detection method and device
CN110675862A (en) Corpus acquisition method, electronic device and storage medium
CN110503944B (en) Method and device for training and using voice awakening model
CN111462756B (en) Voiceprint recognition method and device, electronic equipment and storage medium
US9595261B2 (en) Pattern recognition device, pattern recognition method, and computer program product
CN111583912A (en) Voice endpoint detection method and device and electronic equipment
CN111081280A (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN111179915A (en) Age identification method and device based on voice
CN111832308A (en) Method and device for processing consistency of voice recognition text
CN112002349B (en) Voice endpoint detection method and device
CN111816216A (en) Voice activity detection method and device
CN111243604B (en) Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system
CN111640423B (en) Word boundary estimation method and device and electronic equipment
CN113838462A (en) Voice wake-up method and device, electronic equipment and computer readable storage medium
CN112951219A (en) Noise rejection method and device
CN110706691B (en) Voice verification method and device, electronic equipment and computer readable storage medium
CN112418173A (en) Abnormal sound identification method and device and electronic equipment
CN115762500A (en) Voice processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220