CN112233656A - Artificial intelligent voice awakening method - Google Patents

Artificial intelligent voice awakening method Download PDF

Info

Publication number
CN112233656A
CN112233656A CN202011075074.3A CN202011075074A CN112233656A CN 112233656 A CN112233656 A CN 112233656A CN 202011075074 A CN202011075074 A CN 202011075074A CN 112233656 A CN112233656 A CN 112233656A
Authority
CN
China
Prior art keywords
awakening
voice
voice data
model
wake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011075074.3A
Other languages
Chinese (zh)
Inventor
程松林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Fastcall Information Technology Co ltd
Original Assignee
Anhui Fastcall Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Fastcall Information Technology Co ltd filed Critical Anhui Fastcall Information Technology Co ltd
Priority to CN202011075074.3A priority Critical patent/CN112233656A/en
Publication of CN112233656A publication Critical patent/CN112233656A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephone Function (AREA)

Abstract

The invention relates to voice awakening, in particular to an artificial intelligent voice awakening method, which comprises the steps of obtaining voice data, determining energy characteristics corresponding to the voice data by using a voice detection model, determining text data corresponding to the voice data according to the energy characteristics, judging whether the voice data contains awakening keywords or not, if the voice data contains the awakening keywords, utilizing an awakening judgment model to carry out awakening judgment, and if not, outputting a current situation maintaining instruction by the awakening judgment model; the technical scheme provided by the invention can effectively overcome the defects of low accuracy of voice awakening recognition and incapability of flexibly adjusting awakening words in the prior art.

Description

Artificial intelligent voice awakening method
Technical Field
The invention relates to voice awakening, in particular to an artificial intelligence voice awakening method.
Background
In an intelligent home or a voice interactive system, a voice wake-up (wakeword) technology is widely applied. However, the accuracy and the computation load of voice wakeup recognition greatly reduce the actual application experience, and the requirements on the hardware of the device are improved, for example: if the false wake-up rate of voice wake-up is greater than a threshold in the application, such wake-up false trigger frequency may cause the user to feel the discomfort; on the other hand, if the computation amount of voice wakeup exceeds the computation capability of some low-end chips, the voice wakeup technology is restricted to be used by many products.
In the existing related technologies, a keyword recognition (keyword-spotting) technology is used for voice awakening, a smart small decoding network is built by designing a deep neural network model, and a voice awakening function is realized by matching with some keyword detection technologies.
However, the above voice wake-up technology based on the keyword recognition method has a large number of model parameters, and the design of the filling words needs to be changed for different wake-up words, and the corresponding decoding parameters and the keyword detection technology need to be adjusted, so that it is difficult to have a uniform algorithm to ensure that the wake-up effect of each wake-up word is at a stable level.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects in the prior art, the invention provides an artificial intelligent voice awakening method which can effectively overcome the defects that the voice awakening recognition accuracy rate is low and the awakening words cannot be flexibly adjusted in the prior art.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
an artificial intelligence voice awakening method comprises the following steps:
s1, acquiring voice data;
s2, determining energy characteristics corresponding to the voice data by using the voice detection model, and determining text data corresponding to the voice data according to the energy characteristics;
s3, judging whether the voice data contains the awakening keyword;
and S4, if the voice data contains the awakening keyword, utilizing the awakening judgment model to perform awakening judgment, otherwise, outputting a current status maintaining instruction by the awakening judgment model.
Preferably, the determining the energy feature corresponding to the voice data by using the voice detection model includes:
preprocessing voice data, and determining energy values of filter banks in the voice data;
and processing the energy values of each filter bank in the voice data by using the trained voice detection model.
Preferably, the processing the energy values of the filter banks in the speech data by using the trained speech detection model includes:
determining the energy value of each filter group in a certain frame of voice data and corresponding smoothing parameters;
acquiring the energy value of each smoothed filter bank in the previous frame of voice data;
and determining the corresponding smoothed filter bank energy value in certain frame of voice data according to the filter bank energy value, the smoothed filter bank energy value and the smoothing parameter.
Preferably, the training step of the speech detection model comprises:
acquiring training voice data and initializing a voice detection model;
and training the voice detection model by using the training voice data, and determining the trained voice detection model and the corresponding smoothing parameters.
Preferably, the determining whether the voice data includes a wake-up keyword includes:
carrying out feature analysis on voice data of continuous multiple frames, and caching audio features;
comparing each frame of cached audio features with the awakening keywords by using a keyword comparison model, and determining the association degree between the frame of audio features and the awakening keywords;
and determining the confidence level of the voice data containing the awakening key words according to the association degree between each audio feature and the awakening key words.
Preferably, the keyword comparison model is trained first, and then the wake-up decision model is trained.
Preferably, the keyword comparison model is trained using a first speech data training sample set, where the first speech data training sample set includes a wake-up keyword.
Preferably, the awakening decision model is trained by using a second speech data training sample set and awakening keyword text data, and background noise in the second speech data training sample set is greater than background noise in the first speech data training sample set.
Preferably, the making of the wake decision by using the wake decision model includes:
and inputting the awakening keywords detected by the keyword comparison model and the text data in the S2 into the trained awakening judgment model, and comprehensively carrying out awakening judgment by the awakening judgment model according to the detection result of the keyword comparison model and the text data.
Preferably, if the wake-up decision model determines that both the detection result of the keyword comparison model and the text data contain the wake-up keyword, the wake-up decision model outputs a wake-up instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.
(III) advantageous effects
Compared with the prior art, the artificial intelligence voice awakening method provided by the invention has the advantages that the voice awakening identification accuracy can be effectively improved through the comparison detection among the voice data, the text data converted from the voice data and the awakening keywords, the voice detection model is constructed to convert the voice data into the text data, the keyword comparison model is constructed to judge whether the voice data contains the awakening keywords, the awakening judgment model is constructed to carry out awakening judgment, and the awakening keywords can be flexibly adjusted conveniently through setting the artificial intelligence model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic flow chart of voice wake-up according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An artificial intelligence voice awakening method, as shown in fig. 1, acquires voice data; and determining energy characteristics corresponding to the voice data by using the voice detection model, and determining text data corresponding to the voice data according to the energy characteristics.
In the technical scheme of the application, the training step of the voice detection model comprises the following steps:
acquiring training voice data and initializing a voice detection model;
and training the voice detection model by using the training voice data, and determining the trained voice detection model and the corresponding smoothing parameters.
The method for determining the energy characteristics corresponding to the voice data by using the voice detection model comprises the following steps:
preprocessing voice data, and determining energy values of filter banks in the voice data;
and processing the energy values of each filter bank in the voice data by using the trained voice detection model.
The method for processing the energy values of the filter banks in the voice data by using the trained voice detection model comprises the following steps:
determining the energy value of each filter group in a certain frame of voice data and corresponding smoothing parameters;
acquiring the energy value of each smoothed filter bank in the previous frame of voice data;
and determining the corresponding smoothed filter bank energy value in certain frame of voice data according to the filter bank energy value, the smoothed filter bank energy value and the smoothing parameter.
Judging whether the voice data contains a wake-up keyword or not; if the voice data contains the awakening keyword, the awakening judgment model is used for awakening judgment, and otherwise, the awakening judgment model outputs a current status maintaining instruction.
Wherein, judge whether contain the key word of awakening up in the voice data, include:
carrying out feature analysis on voice data of continuous multiple frames, and caching audio features;
comparing each frame of cached audio features with the awakening keywords by using a keyword comparison model, and determining the association degree between the frame of audio features and the awakening keywords;
and determining the confidence level of the voice data containing the awakening key words according to the association degree between each audio feature and the awakening key words.
According to the technical scheme, the keyword comparison model is trained, and then the awakening judgment model is trained.
The keyword comparison model is trained by utilizing a first voice data training sample set, and the first voice data training sample set contains awakening keywords; and the awakening judgment model is trained by utilizing a second voice data training sample set and awakening keyword text data, wherein the background noise in the second voice data training sample set is greater than that in the first voice data training sample set.
Wherein, the awakening judgment is carried out by utilizing the awakening judgment model, and the awakening judgment comprises the following steps:
and inputting the awakening keywords detected by the keyword comparison model and the text data in the S2 into the trained awakening judgment model, and comprehensively carrying out awakening judgment by the awakening judgment model according to the detection result of the keyword comparison model and the text data.
If the awakening judgment model judges that the detection result of the keyword comparison model and the text data both contain the awakening keyword, the awakening judgment model outputs an awakening instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.
According to the technical scheme, the accuracy of voice awakening recognition can be effectively improved through comparison detection between the text data converted from the voice data and the awakening keywords, and the voice awakening recognition device is provided with the voice detection model, the keyword comparison model and the awakening judgment model which can be independently trained, so that the awakening keywords can be flexibly adjusted.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. An artificial intelligence voice awakening method is characterized in that: the method comprises the following steps:
s1, acquiring voice data;
s2, determining energy characteristics corresponding to the voice data by using the voice detection model, and determining text data corresponding to the voice data according to the energy characteristics;
s3, judging whether the voice data contains the awakening keyword;
and S4, if the voice data contains the awakening keyword, utilizing the awakening judgment model to perform awakening judgment, otherwise, outputting a current status maintaining instruction by the awakening judgment model.
2. The artificial intelligence voice wake-up method according to claim 1, characterized in that: the determining the energy characteristics corresponding to the voice data by using the voice detection model comprises the following steps:
preprocessing voice data, and determining energy values of filter banks in the voice data;
and processing the energy values of each filter bank in the voice data by using the trained voice detection model.
3. The artificial intelligence voice wake-up method according to claim 2, characterized in that: the processing of the energy values of the filter banks in the speech data by using the trained speech detection model comprises:
determining the energy value of each filter group in a certain frame of voice data and corresponding smoothing parameters;
acquiring the energy value of each smoothed filter bank in the previous frame of voice data;
and determining the corresponding smoothed filter bank energy value in certain frame of voice data according to the filter bank energy value, the smoothed filter bank energy value and the smoothing parameter.
4. An artificial intelligence voice wake-up method according to any one of claims 1-3, characterised in that: the training step of the voice detection model comprises the following steps:
acquiring training voice data and initializing a voice detection model;
and training the voice detection model by using the training voice data, and determining the trained voice detection model and the corresponding smoothing parameters.
5. The artificial intelligence voice wake-up method according to claim 1, characterized in that: the judging whether the voice data contains the awakening keyword includes:
carrying out feature analysis on voice data of continuous multiple frames, and caching audio features;
comparing each frame of cached audio features with the awakening keywords by using a keyword comparison model, and determining the association degree between the frame of audio features and the awakening keywords;
and determining the confidence level of the voice data containing the awakening key words according to the association degree between each audio feature and the awakening key words.
6. The artificial intelligence voice wake-up method of claim 5, wherein: and training the keyword comparison model, and then training the awakening judgment model.
7. The artificial intelligence voice wake-up method of claim 6, wherein: the keyword comparison model is trained by using a first voice data training sample set, and the first voice data training sample set contains awakening keywords.
8. The artificial intelligence voice wake-up method of claim 7, wherein: the awakening judgment model is trained by utilizing a second voice data training sample set and awakening keyword text data, and background noise in the second voice data training sample set is larger than background noise in the first voice data training sample set.
9. The artificial intelligence voice wake-up method of claim 8, wherein: the method for performing wake-up judgment by using the wake-up judgment model comprises the following steps:
and inputting the awakening keywords detected by the keyword comparison model and the text data in the S2 into the trained awakening judgment model, and comprehensively carrying out awakening judgment by the awakening judgment model according to the detection result of the keyword comparison model and the text data.
10. The artificial intelligence voice wake-up method of claim 9, wherein: if the awakening judgment model judges that the detection result of the keyword comparison model and the text data both contain the awakening keyword, the awakening judgment model outputs an awakening instruction; otherwise, the awakening judgment model outputs a current status maintaining instruction.
CN202011075074.3A 2020-10-09 2020-10-09 Artificial intelligent voice awakening method Pending CN112233656A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011075074.3A CN112233656A (en) 2020-10-09 2020-10-09 Artificial intelligent voice awakening method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011075074.3A CN112233656A (en) 2020-10-09 2020-10-09 Artificial intelligent voice awakening method

Publications (1)

Publication Number Publication Date
CN112233656A true CN112233656A (en) 2021-01-15

Family

ID=74120504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011075074.3A Pending CN112233656A (en) 2020-10-09 2020-10-09 Artificial intelligent voice awakening method

Country Status (1)

Country Link
CN (1) CN112233656A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
CN107346659A (en) * 2017-06-05 2017-11-14 百度在线网络技术(北京)有限公司 Audio recognition method, device and terminal based on artificial intelligence
CN110364143A (en) * 2019-08-14 2019-10-22 腾讯科技(深圳)有限公司 Voice awakening method, device and its intelligent electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN107221326A (en) * 2017-05-16 2017-09-29 百度在线网络技术(北京)有限公司 Voice awakening method, device and computer equipment based on artificial intelligence
CN107346659A (en) * 2017-06-05 2017-11-14 百度在线网络技术(北京)有限公司 Audio recognition method, device and terminal based on artificial intelligence
CN110364143A (en) * 2019-08-14 2019-10-22 腾讯科技(深圳)有限公司 Voice awakening method, device and its intelligent electronic device

Similar Documents

Publication Publication Date Title
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
CN110415699A (en) A kind of judgment method, device and electronic equipment that voice wakes up
CN111524527B (en) Speaker separation method, speaker separation device, electronic device and storage medium
CN101320559B (en) Sound activation detection apparatus and method
CN109461446B (en) Method, device, system and storage medium for identifying user target request
CN110767218A (en) End-to-end speech recognition method, system, device and storage medium thereof
WO2022134833A1 (en) Speech signal processing method, apparatus and device, and storage medium
CN105632486A (en) Voice wake-up method and device of intelligent hardware
CN106653031A (en) Voice wake-up method and voice interaction device
CN109360572B (en) Call separation method and device, computer equipment and storage medium
CN103258535A (en) Identity recognition method and system based on voiceprint recognition
CN111161726B (en) Intelligent voice interaction method, device, medium and system
CN111223488A (en) Voice wake-up method, device, equipment and storage medium
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically
CN110414005B (en) Intention recognition method, electronic device and storage medium
CN111508493A (en) Voice wake-up method and device, electronic equipment and storage medium
CN111179944B (en) Voice awakening and age detection method and device and computer readable storage medium
CN114550703A (en) Training method and device of voice recognition system, and voice recognition method and device
CN112233656A (en) Artificial intelligent voice awakening method
CN110930997A (en) Method for labeling audio by using deep learning model
CN112825250A (en) Voice wake-up method, apparatus, storage medium and program product
CN112017676A (en) Audio processing method, apparatus and computer readable storage medium
CN111079705B (en) Vibration signal classification method
CN114927128A (en) Voice keyword detection method and device, electronic equipment and readable storage medium
CN114937449A (en) Voice keyword recognition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210115