WO2021042969A1 - Construction apparatus and construction method for self-learning speech recognition system - Google Patents

Construction apparatus and construction method for self-learning speech recognition system Download PDF

Info

Publication number
WO2021042969A1
WO2021042969A1 PCT/CN2020/109393 CN2020109393W WO2021042969A1 WO 2021042969 A1 WO2021042969 A1 WO 2021042969A1 CN 2020109393 W CN2020109393 W CN 2020109393W WO 2021042969 A1 WO2021042969 A1 WO 2021042969A1
Authority
WO
WIPO (PCT)
Prior art keywords
wave
output
calculation unit
speech recognition
recognition system
Prior art date
Application number
PCT/CN2020/109393
Other languages
French (fr)
Chinese (zh)
Inventor
樊茂
Original Assignee
晶晨半导体(上海)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 晶晨半导体(上海)股份有限公司 filed Critical 晶晨半导体(上海)股份有限公司
Publication of WO2021042969A1 publication Critical patent/WO2021042969A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase

Definitions

  • the present invention relates to the technical field of voice recognition, in particular to a construction device and construction method of a self-learning speech recognition system.
  • a construction device of a self-learning speech recognition system aimed at reducing energy consumption during standby is provided.
  • a construction device for a self-learning speech recognition system is applied to a speech recognition system.
  • the speech recognition system includes a microphone and a speech recognition module to which the construction device is applied.
  • the microphone and the speech recognition module are connected.
  • the construction device includes:
  • the analysis unit is used to analyze the output signal of the microphone to obtain multiple signal parameters
  • the recognition unit is connected with the analysis unit, and judges whether the output signal is a preset activation voice according to the signal parameters.
  • the device for constructing a self-learning speech recognition system wherein the output signal is a waveform signal.
  • the device for constructing a self-learning speech recognition system wherein the analysis unit sequentially saves each type of signal parameter into a corresponding sequence, and outputs the signal parameter of each sequence to the recognition unit.
  • a device for constructing a self-learning speech recognition system wherein:
  • the recognition unit is a neural network, which includes:
  • the first calculation unit is configured to output the first output parameter according to the signal parameters of the multiple sequences
  • the second calculation unit is configured to output the second output parameter according to the signal parameters of the multiple sequences
  • the third calculation unit is configured to output the third output parameter according to the signal parameter of the corresponding sequence
  • the fourth calculation unit is configured to output the fourth output parameter according to the signal parameter of the corresponding sequence
  • the hidden layer includes multiple first nodes, each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit, and each first node is set with a feature that activates the voice Information, the first node receives and judges whether the first output parameter, the second output parameter, the third output parameter, and the fourth output parameter conform to the corresponding characteristic information, and outputs the judgment result;
  • the output layer includes a plurality of second nodes, each second node is connected to each first node, and each second node is set with a corresponding activation voice, and it is determined whether the output signal matches the activation voice according to the judgment result.
  • the device for constructing a self-learning speech recognition system wherein the types of signal parameters include wave troughs, wave crests, and the interval time between adjacent wave troughs and wave crests.
  • the device for constructing a self-learning speech recognition system wherein the first output parameter is an envelope value; and/or
  • the second output parameter is the number of wave edges formed by adjacent wave troughs and wave crests.
  • the third output parameter is the difference between two adjacent troughs.
  • the fourth output parameter is the difference between two adjacent peaks.
  • a device for constructing a self-learning speech recognition system wherein:
  • the first calculation unit calculates the envelope value through the trough, the crest and the interval time; and/or
  • the second calculation unit calculates the number of wave edges formed by adjacent wave troughs and wave crests through calculation of wave troughs and wave crests; and/or
  • the third calculation unit calculates the difference between two adjacent wave troughs through wave troughs; and/or
  • the fourth calculating unit obtains the difference between two adjacent wave crests by calculating the wave crests.
  • the voice recognition system includes a microphone and a voice recognition module to which a building device is applied, and the microphone is connected to the voice recognition module.
  • the construction method includes the following steps:
  • Step S1 analyzing the output signal of the microphone to obtain multiple signal parameters
  • Step S2 judging whether the output signal is a preset activation voice according to the signal parameters.
  • a device for constructing a self-learning speech recognition system wherein:
  • step S2 a neural network is provided, and the neural network determines whether the output signal is a preset activation voice.
  • the device for constructing a self-learning speech recognition system wherein the neural network includes:
  • the first calculation unit is used to output the envelope value according to the trough, the crest and the interval time;
  • the second calculation unit is used to output the number of wave edges composed of adjacent wave troughs and wave crests according to wave troughs and wave crests;
  • the third calculation unit is used to output the difference between two adjacent wave troughs according to the wave trough;
  • the fourth calculation unit is configured to output the difference between two adjacent wave crests according to the wave crests;
  • the hidden layer includes multiple first nodes, each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit, and each first node is set with a feature that activates the voice Information, the first node receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the corresponding characteristic information, and outputs the judgment result;
  • the output layer includes a plurality of second nodes, each second node is connected to each first node, and each second node is set with a corresponding activation voice, and judging whether the output signal conforms to the activation voice according to the judgment result;
  • Step S2 includes the following steps:
  • Step S21 calculating the envelope value through the trough, crest and interval time.
  • Step S22 each first node receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the characteristic information, and output the judgment result;
  • step S23 each second node judges whether the output signal conforms to the activation voice according to the judgment result, and outputs the judgment result.
  • the wake-up work is performed by activating the voice, so that the power module, ADC, and CPU sleep during the standby process, so as to reduce the energy consumption during the standby process.
  • Figure 1 is a schematic structural diagram of an embodiment of a device for constructing a self-learning speech recognition system according to the present invention
  • FIG. 2 is a schematic diagram of the structure of the neural network of the embodiment of the construction device of the self-learning speech recognition system of the present invention
  • FIG. 3 is a flowchart of an embodiment of a method for constructing a self-learning speech recognition system according to the present invention
  • Fig. 4 is a flowchart of step S2 of an embodiment of the method for constructing a self-learning speech recognition system of the present invention.
  • the present invention includes a construction device for a self-learning speech recognition system, which is applied to a speech recognition system.
  • the speech recognition system includes a microphone and a speech recognition module to which the construction device is applied.
  • the microphone and the speech recognition module are connected, as shown in FIG.
  • the device includes:
  • the analysis unit is used to analyze the output signal of the microphone to obtain multiple signal parameters
  • the recognition unit is connected with the analysis unit, and judges whether the output signal is a preset activation voice according to the signal parameters.
  • the recognition unit recognizes whether the signal parameter in the analysis unit is a preset activation voice, and the activation voice is used to wake up, so that the power module, ADC, and CPU sleep during the standby process to reduce standby Energy consumption in the process.
  • the preset activation voice can be a preset number, of which the preset number can be 2, 3, or 4. Since the preset activation voice is obtained according to the first node in the hidden layer, the preset activation Do not speak too much to reduce energy consumption.
  • the output signal is a waveform signal.
  • the types of the aforementioned signal parameters may include troughs, crests, and the interval time between adjacent troughs and crests.
  • the analysis unit may sequentially save the signal parameters of each type in the corresponding sequence, and output the signal parameters of each sequence to the identification unit.
  • the sequence where the wave trough is located can be ⁇ drop 1 , drop 2 ,...drop n ⁇ , where drop is used to represent the wave trough;
  • the sequence where the wave crest is located can be ⁇ rise 1 , rise 2 ,...rise n ⁇ , where rise is used to represent the wave crest;
  • the sequence of the interval time can be ⁇ T 1 , T 2 ,...T n ⁇ , where T is used to represent the interval time.
  • the recognition unit may be a neural network.
  • the neural network includes:
  • the first calculation unit 10 is configured to output the first output parameter according to the signal parameters of the multiple sequences
  • the second calculation unit 20 is configured to output the second output parameter according to the signal parameters of the multiple sequences
  • the third calculation unit 30 is configured to output the third output parameter according to the signal parameter of the corresponding sequence
  • the fourth calculation unit 40 is configured to output the fourth output parameter according to the signal parameter of the corresponding sequence
  • the hidden layer includes a plurality of first nodes 50, and each first node 50 is connected to a first calculation unit 10, a second calculation unit 20, a third calculation unit 30, and a fourth calculation unit 40, and each first node 50 is connected to the first calculation unit 10, the second calculation unit 20, the third calculation unit 30, and the fourth calculation unit 40.
  • the first node 50 receives and judges whether the first output parameter, the second output parameter, the third output parameter, and the fourth output parameter conform to the corresponding feature information, and outputs the judgment result;
  • the output layer includes a plurality of second nodes 60, each second node 60 is connected to each first node 50, and each second node 60 is set with a corresponding activation voice, and it is determined whether the output signal matches the activation voice according to the judgment result.
  • the number of the aforementioned hidden layers can be self-set according to the needs of the user.
  • each node can be a filter.
  • the first output parameter is an envelope value
  • the second output parameter is the number of wave edges formed by adjacent wave troughs and wave crests
  • the third output parameter is the difference between two adjacent troughs
  • the fourth output parameter is the difference between two adjacent peaks.
  • the first calculation unit 10 calculates the envelope value through the wave trough, the wave crest and the interval time;
  • the second calculation unit 20 calculates the number of wave edges formed by adjacent wave valleys and wave crests through calculation of wave valleys and wave crests;
  • the third calculation unit 30 calculates the difference between two adjacent wave troughs by calculating the wave trough;
  • the fourth calculating unit 40 obtains the difference between two adjacent wave crests by calculating the wave crests.
  • the difference between the two adjacent wave troughs is the difference between the previous wave trough in the wave trough sequence minus the next wave trough; the difference between the two adjacent wave crests is the previous wave peak minus the wave peak sequence in the wave sequence. Go to the difference of the last wave crest.
  • the neural network can set a plurality of preset activation voices for training, and input the signal parameters in the output signals corresponding to the preset activation voices into the neural network.
  • the first calculation unit 10 in the neural network is based on the troughs.
  • the wave crest and interval time are calculated to obtain the envelope value
  • the second calculating unit 20 calculates the number of adjacent wave troughs and wave crests according to the wave trough and the wave crest
  • the third calculating unit 30 calculates according to the wave trough to obtain the adjacent wave
  • the fourth calculation unit 40 calculates the difference between the two adjacent wave crests according to the wave crests.
  • Each first node 50 in the hidden layer receives and judges the envelope value, the number of wave edges, and the adjacent wave crests. Whether the difference between the two troughs and the two adjacent peaks match the characteristic information, and output the judgment result.
  • Each second node 60 in the output layer judges whether the output signal conforms to the active voice according to the judgment result, and outputs the judgment As a result, when the output signal is the corresponding activated voice, the signal parameters in the output signal corresponding to the preset activation voice are repeatedly input for training; when the output signal is not the corresponding activated voice, the first node 50 corresponding to the judgment result is adjusted Weight, continue to input the signal parameters in the output signal for training, until the output layer judges that the output signal is the corresponding active voice, input the signal parameters in the output signal corresponding to other preset active voices for training, so as to achieve prediction Obtain the activation voice corresponding to the output signal.
  • the judgment result can be represented by a logical value.
  • the logical value of the preset activation voice in the corresponding second node 60 is 1010101010, and the signal parameters in the output signal corresponding to the preset activation voice are input into the neural network.
  • Each first node 50 in the hidden layer receives and determines whether the output parameter matches the characteristic information.
  • the logical value corresponding to the output judgment result is 1; when the output parameter does not match the characteristic information, the output judgment The logical value corresponding to the result is 0; the second node 60 of the output layer judges whether the output signal conforms to the preset activation voice according to the received judgment result.
  • the second node 60 When the judgment result is the corresponding logical value 1010101010, the second node 60 outputs the judgment result
  • the corresponding logical value is 1, to indicate that the output signal conforms to the above-mentioned preset activation voice; when the judgment result is not the corresponding logical value 1010101010, the second node 60 outputs the logical value corresponding to the judgment result as 0, which means that the output signal is not The activation voice conforms to the above preset.
  • the speech recognition system includes a microphone and a speech recognition module with a construction device.
  • the microphone and the speech recognition module are connected, as shown in Figure 4, the construction method It includes the following steps:
  • Step S1 analyzing the output signal of the microphone to obtain multiple signal parameters
  • Step S2 judging whether the output signal is a preset activation voice according to the signal parameters.
  • the power module by analyzing whether the signal parameter is a preset activation voice, and waking up through the activation voice, the power module, ADC, and CPU sleep during the standby process to reduce energy consumption during the standby process.
  • step S2 a neural network is provided, and the neural network determines whether the output signal is a preset activation voice.
  • Neural networks include:
  • the first calculation unit 10 is configured to output the envelope value according to the trough, the crest and the interval time;
  • the second calculation unit 20 is configured to output the number of wave edges composed of adjacent wave troughs and wave crests according to wave troughs and wave crests;
  • the third calculation unit 30 is configured to output the difference between two adjacent wave troughs according to the wave trough;
  • the fourth calculation unit 40 is configured to output the difference between two adjacent wave crests according to the wave crests;
  • the hidden layer includes a plurality of first nodes 50, and each first node 50 is connected to a first calculation unit 10, a second calculation unit 20, a third calculation unit 30, and a fourth calculation unit 40, and each first node 50 is connected to the first calculation unit 10, the second calculation unit 20, the third calculation unit 30, and the fourth calculation unit 40.
  • Set a feature information of an activated voice the first node 50 receives and determines whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the corresponding feature information, and Output the judgment result;
  • the output layer includes a plurality of second nodes 60, each second node 60 is connected to each first node 50, and each second node 60 is set with a corresponding activation voice, and judging whether the output signal conforms to the activation voice according to the judgment result;
  • Step S2 includes the following steps:
  • Step S21 calculating the envelope value through the trough, crest and interval time.
  • Step S22 each first node 50 receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the characteristic information, and output the judgment result;
  • each second node 60 judges whether the output signal conforms to the active voice according to the judgment result, and outputs the judgment result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A construction apparatus and construction method for a self-learning speech recognition system. The construction apparatus is applied to a speech recognition system; the speech recognition system comprises a microphone and a speech recognition module using the construction apparatus; the microphone is connected to the speech recognition module; the construction apparatus comprises an analysis unit (1) configured to analyze an output signal of the microphone to obtain a plurality of signal parameters, and a recognition unit (2) connected to the analysis unit (1) and configured to determine, according to the signal parameters, whether the output signal is a preset activation speech. A wake up operation is implemented by means of an activation speech, so that a power supply module, an ADC, and a CPU in a standby process are enabled to sleep, thereby reducing energy consumption in the standby process.

Description

一种自学习语音识别系统的构建装置和构建方法Construction device and construction method of self-learning speech recognition system 技术领域Technical field
本发明涉及声音识别技术领域,尤其涉及一种自学习语音识别系统的构建装置和构建方法。The present invention relates to the technical field of voice recognition, in particular to a construction device and construction method of a self-learning speech recognition system.
背景技术Background technique
随着计算机应用技术的快速发展,语音或者其他类型声音识别技术的应用越来越广泛,对声音识别的需求也越来越多。目前的超高清智能电视,智能音箱在待机的期间,仍然需要保留语音唤醒功能,因此语音识别系统仍然需要工作,即电源模块,ADC(Analog-to-Digital Converter,模数转换器),CPU(Central Processing Unit,中央处理器)都还在工作模式,使得待机过程中消耗大量能源。With the rapid development of computer application technology, the application of speech or other types of voice recognition technology is becoming more and more extensive, and the demand for voice recognition is also increasing. The current ultra-high-definition smart TVs and smart speakers still need to retain the voice wake-up function during the standby period, so the voice recognition system still needs to work, that is, the power module, ADC (Analog-to-Digital Converter, analog-to-digital converter), CPU ( The Central Processing Unit (Central Processing Unit) is still in working mode, which consumes a lot of energy during standby.
发明内容Summary of the invention
针对现有技术中存在的上述问题,现提供一种旨在降低待机过程中的能源消耗的自学习语音识别系统的构建装置。In view of the above-mentioned problems in the prior art, a construction device of a self-learning speech recognition system aimed at reducing energy consumption during standby is provided.
具体技术方案如下:The specific technical solutions are as follows:
一种自学习语音识别系统的构建装置,应用于语音识别系统中,语音识别系统包括麦克风和应用有构建装置的语音识别模块,麦克风和语音识别模块连接,其中,构建装置包括:A construction device for a self-learning speech recognition system is applied to a speech recognition system. The speech recognition system includes a microphone and a speech recognition module to which the construction device is applied. The microphone and the speech recognition module are connected. The construction device includes:
分析单元,用于将麦克风的输出信号进行分析以得到多个信号参数;The analysis unit is used to analyze the output signal of the microphone to obtain multiple signal parameters;
识别单元,与分析单元连接,根据信号参数判断输出信号是否为预设的激活语音。The recognition unit is connected with the analysis unit, and judges whether the output signal is a preset activation voice according to the signal parameters.
优选的,自学习语音识别系统的构建装置,其中,输出信号为波形信号。Preferably, the device for constructing a self-learning speech recognition system, wherein the output signal is a waveform signal.
优选的,自学习语音识别系统的构建装置,其中,分析单元将每个类型的信号参数依次保存到对应的序列中,并将每个序列的信号参数输出到识别单元中。Preferably, the device for constructing a self-learning speech recognition system, wherein the analysis unit sequentially saves each type of signal parameter into a corresponding sequence, and outputs the signal parameter of each sequence to the recognition unit.
优选的,自学习语音识别系统的构建装置,其中,Preferably, a device for constructing a self-learning speech recognition system, wherein:
识别单元为神经网络,神经网路包括:The recognition unit is a neural network, which includes:
第一计算单元,用于根据多个序列的信号参数输出第一输出参数;The first calculation unit is configured to output the first output parameter according to the signal parameters of the multiple sequences;
第二计算单元,用于根据多个序列的信号参数输出第二输出参数;The second calculation unit is configured to output the second output parameter according to the signal parameters of the multiple sequences;
第三计算单元,用于根据对应序列的信号参数输出第三输出参数;The third calculation unit is configured to output the third output parameter according to the signal parameter of the corresponding sequence;
第四计算单元,用于根据对应序列的信号参数输出第四输出参数;The fourth calculation unit is configured to output the fourth output parameter according to the signal parameter of the corresponding sequence;
隐层,包括多个第一节点,每个第一节点均与第一计算单元、第二计算单元、第三计算单元和第四计算单元连接,每个第一节点设置一个激活语音的一个特征信息,第一节点接收并判断第一输出参数、第二输出参数、第三输出参数和第四输出参数是否符合对应的特征信息,并将判断结果输出;The hidden layer includes multiple first nodes, each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit, and each first node is set with a feature that activates the voice Information, the first node receives and judges whether the first output parameter, the second output parameter, the third output parameter, and the fourth output parameter conform to the corresponding characteristic information, and outputs the judgment result;
输出层,包括多个第二节点,每个第二节点与每个第一节点连接,每个第二节点设置对应的一个激活语音,根据判断结果判断输出信号是否符合激活语音。The output layer includes a plurality of second nodes, each second node is connected to each first node, and each second node is set with a corresponding activation voice, and it is determined whether the output signal matches the activation voice according to the judgment result.
优选的,自学习语音识别系统的构建装置,其中,信号参数的类型包括波谷、波峰、以及相邻的波谷和波峰之间的间隔时间。Preferably, the device for constructing a self-learning speech recognition system, wherein the types of signal parameters include wave troughs, wave crests, and the interval time between adjacent wave troughs and wave crests.
优选的,自学习语音识别系统的构建装置,其中,第一输出参数为包络值;和/或Preferably, the device for constructing a self-learning speech recognition system, wherein the first output parameter is an envelope value; and/or
第二输出参数为相邻的波谷和波峰组成的波沿的数量;和/或The second output parameter is the number of wave edges formed by adjacent wave troughs and wave crests; and/or
第三输出参数为相邻的两个波谷的差;和/或The third output parameter is the difference between two adjacent troughs; and/or
第四输出参数为相邻的两个波峰的差。The fourth output parameter is the difference between two adjacent peaks.
优选的,自学习语音识别系统的构建装置,其中,Preferably, a device for constructing a self-learning speech recognition system, wherein:
第一计算单元通过波谷、波峰和间隔时间进行计算得到包络值;和/或The first calculation unit calculates the envelope value through the trough, the crest and the interval time; and/or
第二计算单元通过波谷和波峰进行计算得到相邻的波谷和波峰组成的波沿的数量;和/或The second calculation unit calculates the number of wave edges formed by adjacent wave troughs and wave crests through calculation of wave troughs and wave crests; and/or
第三计算单元通过波谷进行计算得到相邻的两个波谷的差;和/或The third calculation unit calculates the difference between two adjacent wave troughs through wave troughs; and/or
第四计算单元通过波峰进行计算得到相邻的两个波峰的差。The fourth calculating unit obtains the difference between two adjacent wave crests by calculating the wave crests.
还包括一种自学习语音识别系统的构建方法,应用于语音识别系统中,语音识别系统包括麦克风和应用有构建装置的语音识别模块,麦克风和语音识别模块连接,其中,构建方法包括以下步骤:It also includes a method for constructing a self-learning voice recognition system, which is applied to a voice recognition system. The voice recognition system includes a microphone and a voice recognition module to which a building device is applied, and the microphone is connected to the voice recognition module. The construction method includes the following steps:
步骤S1,将麦克风的输出信号进行分析以得到多个信号参数;Step S1, analyzing the output signal of the microphone to obtain multiple signal parameters;
步骤S2,根据信号参数判断输出信号是否为预设的激活语音。Step S2, judging whether the output signal is a preset activation voice according to the signal parameters.
优选的,自学习语音识别系统的构建装置,其中,Preferably, a device for constructing a self-learning speech recognition system, wherein:
步骤S2中,提供一神经网络,通过神经网络判断输出信号是否为预设的激活语音。In step S2, a neural network is provided, and the neural network determines whether the output signal is a preset activation voice.
优选的,自学习语音识别系统的构建装置,其中,神经网路包括:Preferably, the device for constructing a self-learning speech recognition system, wherein the neural network includes:
第一计算单元,用于根据波谷、波峰和间隔时间输出包络值;The first calculation unit is used to output the envelope value according to the trough, the crest and the interval time;
第二计算单元,用于根据波谷和波峰输出相邻的波谷和波峰组成的波沿的数量;The second calculation unit is used to output the number of wave edges composed of adjacent wave troughs and wave crests according to wave troughs and wave crests;
第三计算单元,用于根据波谷输出相邻的两个波谷的差;The third calculation unit is used to output the difference between two adjacent wave troughs according to the wave trough;
第四计算单元,用于根据波峰输出相邻的两个波峰的差;The fourth calculation unit is configured to output the difference between two adjacent wave crests according to the wave crests;
隐层,包括多个第一节点,每个第一节点均与第一计算单元、第二计算单元、第三计算单元和第四计算单元连接,每个第一节点设置一个激活语音的一个特征信息,第一节点接收并判断包络值、波沿的数量、相邻的两个波谷的差和相邻的两个波峰的差是否符合对应的特征信息,并将判断结果输出;The hidden layer includes multiple first nodes, each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit, and each first node is set with a feature that activates the voice Information, the first node receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the corresponding characteristic information, and outputs the judgment result;
输出层,包括多个第二节点,每个第二节点与每个第一节点连接,每个第二节点设置对应的激活语音,根据判断结果判断输出信号是否符合激活语 音;The output layer includes a plurality of second nodes, each second node is connected to each first node, and each second node is set with a corresponding activation voice, and judging whether the output signal conforms to the activation voice according to the judgment result;
步骤S2包括以下步骤:Step S2 includes the following steps:
步骤S21,通过波谷、波峰和间隔时间进行计算得到包络值;和Step S21, calculating the envelope value through the trough, crest and interval time; and
通过波谷和波峰进行计算得到相邻的波谷和波峰组成的波沿的数量;和Calculate through troughs and crests to obtain the number of wave edges formed by adjacent troughs and crests; and
通过波谷进行计算得到相邻的两个波谷的差;和Calculate the difference between two adjacent troughs through troughs; and
通过波峰进行计算得到相邻的两个波峰的差;Calculate the difference between two adjacent wave crests through wave crests;
步骤S22,每个第一节点接收并将判断包络值、波沿的数量、相邻的两个波谷的差和相邻的两个波峰的差是否符合特征信息,并将判断结果输出;Step S22, each first node receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the characteristic information, and output the judgment result;
步骤S23,每个第二节点根据判断结果判断输出信号是否符合激活语音,并输出判断结果。In step S23, each second node judges whether the output signal conforms to the activation voice according to the judgment result, and outputs the judgment result.
上述技术方案具有如下优点或有益效果:通过激活语音进行唤醒工作,从而实现在待机过程中的电源模块、ADC和CPU进行睡眠,以降低待机过程中的能源消耗。The above technical solution has the following advantages or beneficial effects: the wake-up work is performed by activating the voice, so that the power module, ADC, and CPU sleep during the standby process, so as to reduce the energy consumption during the standby process.
附图说明Description of the drawings
参考所附附图,以更加充分的描述本发明的实施例。然而,所附附图仅用于说明和阐述,并不构成对本发明范围的限制。With reference to the attached drawings, the embodiments of the present invention are described more fully. However, the attached drawings are only for illustration and illustration, and do not constitute a limitation to the scope of the present invention.
图1为本发明自学习语音识别系统的构建装置实施例的结构示意图;Figure 1 is a schematic structural diagram of an embodiment of a device for constructing a self-learning speech recognition system according to the present invention;
图2为本发明自学习语音识别系统的构建装置实施例的神经网络的结构示意图;2 is a schematic diagram of the structure of the neural network of the embodiment of the construction device of the self-learning speech recognition system of the present invention;
图3为本发明自学习语音识别系统的构建方法实施例的流程图;3 is a flowchart of an embodiment of a method for constructing a self-learning speech recognition system according to the present invention;
图4为本发明自学习语音识别系统的构建方法实施例步骤S2的流程图。Fig. 4 is a flowchart of step S2 of an embodiment of the method for constructing a self-learning speech recognition system of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。It should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict.
下面结合附图和具体实施例对本发明作进一步说明,但不作为本发明的限定。The present invention will be further described below in conjunction with the drawings and specific embodiments, but it is not a limitation of the present invention.
本发明包括一种自学习语音识别系统的构建装置,应用于语音识别系统中,语音识别系统包括麦克风和应用有构建装置的语音识别模块,麦克风和语音识别模块连接,如图1所示,构建装置包括:The present invention includes a construction device for a self-learning speech recognition system, which is applied to a speech recognition system. The speech recognition system includes a microphone and a speech recognition module to which the construction device is applied. The microphone and the speech recognition module are connected, as shown in FIG. The device includes:
分析单元,用于将麦克风的输出信号进行分析以得到多个信号参数;The analysis unit is used to analyze the output signal of the microphone to obtain multiple signal parameters;
识别单元,与分析单元连接,根据信号参数判断输出信号是否为预设的激活语音。The recognition unit is connected with the analysis unit, and judges whether the output signal is a preset activation voice according to the signal parameters.
在上述实施例中,通过识别单元识别分析单元中的信号参数是否为预设的激活语音,通过激活语音进行唤醒工作,从而实现在待机过程中的电源模块、ADC和CPU进行睡眠,以降低待机过程中的能源消耗。In the above embodiment, the recognition unit recognizes whether the signal parameter in the analysis unit is a preset activation voice, and the activation voice is used to wake up, so that the power module, ADC, and CPU sleep during the standby process to reduce standby Energy consumption in the process.
其中,预设的激活语音可以为预设数量,其中预设数量可以2个、3个、4个,由于预设的激活语音是根据隐层中的第一节点获得的,因此预设的激活语音不要太多,以降低能源消耗。Among them, the preset activation voice can be a preset number, of which the preset number can be 2, 3, or 4. Since the preset activation voice is obtained according to the first node in the hidden layer, the preset activation Do not speak too much to reduce energy consumption.
进一步地,在上述实施例中,输出信号为波形信号。从而可以在波形信号中获取多个信号参数,例如,上述信号参数的类型可以包括波谷、波峰、以及相邻的波谷和波峰之间的间隔时间。Further, in the above embodiment, the output signal is a waveform signal. In this way, multiple signal parameters can be obtained in the waveform signal. For example, the types of the aforementioned signal parameters may include troughs, crests, and the interval time between adjacent troughs and crests.
进一步地,在上述实施例中,分析单元可以将每个类型的信号参数依次保存到对应的序列中,并将每个序列的信号参数输出到识别单元中。Further, in the foregoing embodiment, the analysis unit may sequentially save the signal parameters of each type in the corresponding sequence, and output the signal parameters of each sequence to the identification unit.
例如,波谷所在的序列可以为{drop 1,drop 2,……drop n},其中,drop用于表示波谷; For example, the sequence where the wave trough is located can be {drop 1 , drop 2 ,...drop n }, where drop is used to represent the wave trough;
波峰所在的序列可以为{rise 1,rise 2,……rise n},其中,rise用于表示波峰; The sequence where the wave crest is located can be {rise 1 , rise 2 ,...rise n }, where rise is used to represent the wave crest;
间隔时间所在的序列可以为{T 1,T 2,……T n},其中,T用于表示间隔时间。 The sequence of the interval time can be {T 1 , T 2 ,...T n }, where T is used to represent the interval time.
进一步地,在上述实施例中,识别单元可以为神经网络,如图2所示,神经网路包括:Further, in the foregoing embodiment, the recognition unit may be a neural network. As shown in FIG. 2, the neural network includes:
第一计算单元10,用于根据多个序列的信号参数输出第一输出参数;The first calculation unit 10 is configured to output the first output parameter according to the signal parameters of the multiple sequences;
第二计算单元20,用于根据多个序列的信号参数输出第二输出参数;The second calculation unit 20 is configured to output the second output parameter according to the signal parameters of the multiple sequences;
第三计算单元30,用于根据对应序列的信号参数输出第三输出参数;The third calculation unit 30 is configured to output the third output parameter according to the signal parameter of the corresponding sequence;
第四计算单元40,用于根据对应序列的信号参数输出第四输出参数;The fourth calculation unit 40 is configured to output the fourth output parameter according to the signal parameter of the corresponding sequence;
隐层,包括多个第一节点50,每个第一节点50均与第一计算单元10、第二计算单元20、第三计算单元30和第四计算单元40连接,每个第一节点50设置一个激活语音的一个特征信息,第一节点50接收并判断第一输出参数、第二输出参数、第三输出参数和第四输出参数是否符合对应的特征信息,并将判断结果输出;The hidden layer includes a plurality of first nodes 50, and each first node 50 is connected to a first calculation unit 10, a second calculation unit 20, a third calculation unit 30, and a fourth calculation unit 40, and each first node 50 is connected to the first calculation unit 10, the second calculation unit 20, the third calculation unit 30, and the fourth calculation unit 40. Setting a feature information of an activated voice, the first node 50 receives and judges whether the first output parameter, the second output parameter, the third output parameter, and the fourth output parameter conform to the corresponding feature information, and outputs the judgment result;
输出层,包括多个第二节点60,每个第二节点60与每个第一节点50连接,每个第二节点60设置对应的一个激活语音,根据判断结果判断输出信号是否符合激活语音。The output layer includes a plurality of second nodes 60, each second node 60 is connected to each first node 50, and each second node 60 is set with a corresponding activation voice, and it is determined whether the output signal matches the activation voice according to the judgment result.
其中,上述隐层的数量可以根据用户的需求进行自设定。Among them, the number of the aforementioned hidden layers can be self-set according to the needs of the user.
在上述神经网络中,每一个节点都可以为滤波器。In the above neural network, each node can be a filter.
进一步地,作为优选的实施方式,第一输出参数为包络值;Further, as a preferred embodiment, the first output parameter is an envelope value;
第二输出参数为相邻的波谷和波峰组成的波沿的数量;The second output parameter is the number of wave edges formed by adjacent wave troughs and wave crests;
第三输出参数为相邻的两个波谷的差;The third output parameter is the difference between two adjacent troughs;
第四输出参数为相邻的两个波峰的差。并且第一计算单元10通过波谷、 波峰和间隔时间进行计算得到包络值;The fourth output parameter is the difference between two adjacent peaks. And the first calculation unit 10 calculates the envelope value through the wave trough, the wave crest and the interval time;
第二计算单元20通过波谷和波峰进行计算得到相邻的波谷和波峰组成的波沿的数量;The second calculation unit 20 calculates the number of wave edges formed by adjacent wave valleys and wave crests through calculation of wave valleys and wave crests;
第三计算单元30通过波谷进行计算得到相邻的两个波谷的差;The third calculation unit 30 calculates the difference between two adjacent wave troughs by calculating the wave trough;
第四计算单元40通过波峰进行计算得到相邻的两个波峰的差。The fourth calculating unit 40 obtains the difference between two adjacent wave crests by calculating the wave crests.
其中,需要说明的是,上述相邻的两个波谷的差为波谷序列中的前一个波谷减去后一个波谷的差;上述相邻的两个波峰的差为波峰序列中的前一个波峰减去后一个波峰的差。Among them, it should be noted that the difference between the two adjacent wave troughs is the difference between the previous wave trough in the wave trough sequence minus the next wave trough; the difference between the two adjacent wave crests is the previous wave peak minus the wave peak sequence in the wave sequence. Go to the difference of the last wave crest.
进一步地,神经网络可以设定多个预设的激活语音进行训练,将上述预设的激活语音对应的输出信号中的信号参数输入到神经网络中,神经网络中的第一计算单元10根据波谷、波峰和间隔时间进行计算得到包络值,第二计算单元20根据波谷和波峰进行计算得到相邻的波谷和波峰组成的波沿的数量,第三计算单元30根据波谷进行计算得到相邻的两个波谷的差,第四计算单元40根据波峰进行计算得到相邻的两个波峰的差,隐层中的每个第一节点50接收并将判断包络值、波沿的数量、相邻的两个波谷的差和相邻的两个波峰的差是否符合特征信息,并将判断结果输出,输出层中的每个第二节点60根据判断结果判断输出信号是否符合激活语音,并输出判断结果当输出信号为对应的激活语音时,重复输入预设的激活语音对应的输出信号中的信号参数进行训练;当输出信号不为对应的激活语音时,调整判断结果对应的第一节点50的权值,继续输入该输出信号中的信号参数进行训练,直至输出层判断该输出信号为对应的激活语音时,输入其他预设的激活语音对应的输出信号中的信号参数进行训练,从而实现预测得到输出信号对应的激活语音。Further, the neural network can set a plurality of preset activation voices for training, and input the signal parameters in the output signals corresponding to the preset activation voices into the neural network. The first calculation unit 10 in the neural network is based on the troughs. , The wave crest and interval time are calculated to obtain the envelope value, the second calculating unit 20 calculates the number of adjacent wave troughs and wave crests according to the wave trough and the wave crest, and the third calculating unit 30 calculates according to the wave trough to obtain the adjacent wave For the difference between the two troughs, the fourth calculation unit 40 calculates the difference between the two adjacent wave crests according to the wave crests. Each first node 50 in the hidden layer receives and judges the envelope value, the number of wave edges, and the adjacent wave crests. Whether the difference between the two troughs and the two adjacent peaks match the characteristic information, and output the judgment result. Each second node 60 in the output layer judges whether the output signal conforms to the active voice according to the judgment result, and outputs the judgment As a result, when the output signal is the corresponding activated voice, the signal parameters in the output signal corresponding to the preset activation voice are repeatedly input for training; when the output signal is not the corresponding activated voice, the first node 50 corresponding to the judgment result is adjusted Weight, continue to input the signal parameters in the output signal for training, until the output layer judges that the output signal is the corresponding active voice, input the signal parameters in the output signal corresponding to other preset active voices for training, so as to achieve prediction Obtain the activation voice corresponding to the output signal.
可以通过逻辑值来表示判断结果,例如,预设的激活语音在对应的第二节点60中的逻辑值为1010101010,在神经网络中输入上述预设的激活语音对应的输出信号中的信号参数,隐层中的每个第一节点50接收并将判断输出参数是否符合特征信息,当输出参数符合特征信息时,输出判断结果对应的 逻辑值为1;当输出参数不符合特征信息时,输出判断结果对应的逻辑值为0;输出层的第二节点60根据接收到的判断结果判断输出信号是否符合预设的激活语音,当判断结果为对应的逻辑值1010101010时,第二节点60输出判断结果对应的逻辑值为1,以表示输出信号符合上述预设的激活语音;当判断结果不为对应的逻辑值1010101010时,第二节点60输出判断结果对应的逻辑值为0,以表示输出信号不符合上述预设的激活语音。The judgment result can be represented by a logical value. For example, the logical value of the preset activation voice in the corresponding second node 60 is 1010101010, and the signal parameters in the output signal corresponding to the preset activation voice are input into the neural network. Each first node 50 in the hidden layer receives and determines whether the output parameter matches the characteristic information. When the output parameter matches the characteristic information, the logical value corresponding to the output judgment result is 1; when the output parameter does not match the characteristic information, the output judgment The logical value corresponding to the result is 0; the second node 60 of the output layer judges whether the output signal conforms to the preset activation voice according to the received judgment result. When the judgment result is the corresponding logical value 1010101010, the second node 60 outputs the judgment result The corresponding logical value is 1, to indicate that the output signal conforms to the above-mentioned preset activation voice; when the judgment result is not the corresponding logical value 1010101010, the second node 60 outputs the logical value corresponding to the judgment result as 0, which means that the output signal is not The activation voice conforms to the above preset.
还包括一种自学习语音识别系统的构建方法,应用于语音识别系统中,语音识别系统包括麦克风和应用有构建装置的语音识别模块,麦克风和语音识别模块连接,如图4所示,构建方法包括以下步骤:It also includes a method for constructing a self-learning speech recognition system, which is used in a speech recognition system. The speech recognition system includes a microphone and a speech recognition module with a construction device. The microphone and the speech recognition module are connected, as shown in Figure 4, the construction method It includes the following steps:
步骤S1,将麦克风的输出信号进行分析以得到多个信号参数;Step S1, analyzing the output signal of the microphone to obtain multiple signal parameters;
步骤S2,根据信号参数判断输出信号是否为预设的激活语音。Step S2, judging whether the output signal is a preset activation voice according to the signal parameters.
在上述实施例中,通过分析信号参数是否为预设的激活语音,通过激活语音进行唤醒工作,从而实现在待机过程中的电源模块、ADC和CPU进行睡眠,以降低待机过程中的能源消耗。In the above embodiment, by analyzing whether the signal parameter is a preset activation voice, and waking up through the activation voice, the power module, ADC, and CPU sleep during the standby process to reduce energy consumption during the standby process.
进一步地,在上述实施例中,步骤S2中,提供一神经网络,通过神经网络判断输出信号是否为预设的激活语音。Further, in the above embodiment, in step S2, a neural network is provided, and the neural network determines whether the output signal is a preset activation voice.
神经网路包括:Neural networks include:
第一计算单元10,用于根据波谷、波峰和间隔时间输出包络值;The first calculation unit 10 is configured to output the envelope value according to the trough, the crest and the interval time;
第二计算单元20,用于根据波谷和波峰输出相邻的波谷和波峰组成的波沿的数量;The second calculation unit 20 is configured to output the number of wave edges composed of adjacent wave troughs and wave crests according to wave troughs and wave crests;
第三计算单元30,用于根据波谷输出相邻的两个波谷的差;The third calculation unit 30 is configured to output the difference between two adjacent wave troughs according to the wave trough;
第四计算单元40,用于根据波峰输出相邻的两个波峰的差;The fourth calculation unit 40 is configured to output the difference between two adjacent wave crests according to the wave crests;
隐层,包括多个第一节点50,每个第一节点50均与第一计算单元10、第二计算单元20、第三计算单元30和第四计算单元40连接,每个第一节点50设置一个激活语音的一个特征信息,第一节点50接收并判断包络值、波沿的数量、相邻的两个波谷的差和相邻的两个波峰的差是否符合对应的特征 信息,并将判断结果输出;The hidden layer includes a plurality of first nodes 50, and each first node 50 is connected to a first calculation unit 10, a second calculation unit 20, a third calculation unit 30, and a fourth calculation unit 40, and each first node 50 is connected to the first calculation unit 10, the second calculation unit 20, the third calculation unit 30, and the fourth calculation unit 40. Set a feature information of an activated voice, the first node 50 receives and determines whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the corresponding feature information, and Output the judgment result;
输出层,包括多个第二节点60,每个第二节点60与每个第一节点50连接,每个第二节点60设置对应的激活语音,根据判断结果判断输出信号是否符合激活语音;The output layer includes a plurality of second nodes 60, each second node 60 is connected to each first node 50, and each second node 60 is set with a corresponding activation voice, and judging whether the output signal conforms to the activation voice according to the judgment result;
步骤S2包括以下步骤:Step S2 includes the following steps:
步骤S21,通过波谷、波峰和间隔时间进行计算得到包络值;和Step S21, calculating the envelope value through the trough, crest and interval time; and
通过波谷和波峰进行计算得到相邻的波谷和波峰组成的波沿的数量;和Calculate through troughs and crests to obtain the number of wave edges formed by adjacent troughs and crests; and
通过波谷进行计算得到相邻的两个波谷的差;和Calculate the difference between two adjacent troughs through troughs; and
通过波峰进行计算得到相邻的两个波峰的差;Calculate the difference between two adjacent wave crests through wave crests;
步骤S22,每个第一节点50接收并将判断包络值、波沿的数量、相邻的两个波谷的差和相邻的两个波峰的差是否符合特征信息,并将判断结果输出;Step S22, each first node 50 receives and judges whether the envelope value, the number of wave edges, the difference between two adjacent wave troughs, and the difference between two adjacent wave crests conform to the characteristic information, and output the judgment result;
步骤S23,每个第二节点60根据判断结果判断输出信号是否符合激活语音,并输出判断结果。In step S23, each second node 60 judges whether the output signal conforms to the active voice according to the judgment result, and outputs the judgment result.
以上仅为本发明较佳的实施例,并非因此限制本发明的实施方式及保护范围,对于本领域技术人员而言,应当能够意识到凡运用本发明说明书及图示内容所作出的等同替换和显而易见的变化所得到的方案,均应当包含在本发明的保护范围内。The above are only preferred embodiments of the present invention, and do not therefore limit the implementation and protection scope of the present invention. For those skilled in the art, they should be aware of equivalent replacements and equivalents made by using the description and illustrations of the present invention. All solutions obtained by obvious changes should be included in the protection scope of the present invention.

Claims (10)

  1. 一种自学习语音识别系统的构建装置,应用于所述语音识别系统中,所述语音识别系统包括麦克风和应用有所述构建装置的语音识别模块,所述麦克风和所述语音识别模块连接,其特征在于,所述构建装置包括:A device for constructing a self-learning voice recognition system is applied to the voice recognition system, the voice recognition system includes a microphone and a voice recognition module to which the device is applied, the microphone and the voice recognition module are connected, It is characterized in that the construction device includes:
    一分析单元,用于将所述麦克风的输出信号进行分析以得到多个信号参数;An analysis unit for analyzing the output signal of the microphone to obtain multiple signal parameters;
    一识别单元,与所述分析单元连接,根据所述信号参数判断所述输出信号是否为预设的激活语音。A recognition unit is connected to the analysis unit, and judges whether the output signal is a preset activation voice according to the signal parameter.
  2. 如权利要求1所述的自学习语音识别系统的构建装置,其特征在于,所述输出信号为波形信号。The device for constructing a self-learning speech recognition system according to claim 1, wherein the output signal is a waveform signal.
  3. 如权利要求1所述的自学习语音识别系统的构建装置,其特征在于,所述分析单元将每个类型的信号参数依次保存到对应的序列中,并将每个所述序列的所述信号参数输出到所述识别单元中。The device for constructing a self-learning speech recognition system according to claim 1, wherein the analysis unit sequentially saves each type of signal parameter into a corresponding sequence, and stores the signal of each sequence The parameters are output to the identification unit.
  4. 如权利要求3所述的自学习语音识别系统的构建装置,其特征在于,所述识别单元为神经网络,所述神经网路包括:The device for constructing a self-learning speech recognition system according to claim 3, wherein the recognition unit is a neural network, and the neural network comprises:
    第一计算单元,用于根据多个序列的信号参数输出第一输出参数;The first calculation unit is configured to output the first output parameter according to the signal parameters of the multiple sequences;
    第二计算单元,用于根据多个序列的信号参数输出第二输出参数;The second calculation unit is configured to output the second output parameter according to the signal parameters of the multiple sequences;
    第三计算单元,用于根据对应序列的信号参数输出第三输出参数;The third calculation unit is configured to output the third output parameter according to the signal parameter of the corresponding sequence;
    第四计算单元,用于根据对应序列的信号参数输出第四输出参数;The fourth calculation unit is configured to output the fourth output parameter according to the signal parameter of the corresponding sequence;
    隐层,包括多个第一节点,每个所述第一节点均与所述第一计算单元、所述第二计算单元、所述第三计算单元和所述第四计算单元连接,每个所述第一节点设置一个所述激活语音的一个特征信息,所述第一节点接收并判断所述第一输出参数、所述第二输出参数、所述第三输出参数和所述第四输出参数是否符合对应的所述特征信息,并将判断结果输出;The hidden layer includes a plurality of first nodes, and each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit. The first node sets a feature information of the activated voice, and the first node receives and judges the first output parameter, the second output parameter, the third output parameter, and the fourth output Whether the parameter conforms to the corresponding characteristic information, and output the judgment result;
    输出层,包括多个第二节点,每个所述第二节点与每个所述第一节点连 接,每个所述第二节点设置对应的一个激活语音,根据所述判断结果判断所述输出信号是否符合所述激活语音。The output layer includes a plurality of second nodes, each of the second nodes is connected to each of the first nodes, and each of the second nodes is set with a corresponding activation voice, and the output is judged according to the judgment result Whether the signal conforms to the activation voice.
  5. 如权利要求3所述的自学习语音识别系统的构建装置,其特征在于,所述信号参数的类型包括波谷、波峰、以及相邻的所述波谷和所述波峰之间的间隔时间。The device for constructing a self-learning speech recognition system according to claim 3, wherein the type of the signal parameter includes a wave trough, a wave crest, and the interval time between the adjacent wave trough and the wave crest.
  6. 如权利要求5所述的自学习语音识别系统的构建装置,其特征在于,The device for constructing a self-learning speech recognition system according to claim 5, wherein:
    所述第一输出参数为包络值;和/或The first output parameter is an envelope value; and/or
    所述第二输出参数为相邻的所述波谷和所述波峰组成的波沿的数量;和/或The second output parameter is the number of wave edges formed by the adjacent wave troughs and wave crests; and/or
    所述第三输出参数为相邻的两个所述波谷的差;和/或The third output parameter is the difference between two adjacent troughs; and/or
    所述第四输出参数为相邻的两个所述波峰的差。The fourth output parameter is the difference between two adjacent wave crests.
  7. 如权利要求6所述的自学习语音识别系统的构建装置,其特征在于,The device for constructing a self-learning speech recognition system according to claim 6, wherein:
    所述第一计算单元通过所述波谷、所述波峰和所述间隔时间进行计算得到所述包络值;和/或The first calculation unit calculates the envelope value through the wave trough, the wave crest, and the interval time; and/or
    所述第二计算单元通过所述波谷和所述波峰进行计算得到相邻的所述波谷和所述波峰组成的所述波沿的数量;和/或The second calculation unit calculates the wave trough and the wave crest to obtain the number of the wave edges formed by the adjacent wave trough and the wave crest; and/or
    所述第三计算单元通过所述波谷进行计算得到相邻的两个所述波谷的差;和/或The third calculation unit calculates the difference between the two adjacent wave troughs through the wave trough; and/or
    所述第四计算单元通过所述波峰进行计算得到相邻的两个所述波峰的差。The fourth calculation unit calculates the difference between the two adjacent wave crests through calculation of the wave crests.
  8. 一种自学习语音识别系统的构建方法,应用于所述语音识别系统中,所述语音识别系统包括麦克风和应用有所述构建装置的语音识别模块,所述麦克风和所述语音识别模块连接,其特征在于,所述构建方法包括以下步骤:A method for constructing a self-learning speech recognition system, applied to the speech recognition system, the speech recognition system including a microphone and a speech recognition module to which the construction device is applied, the microphone and the speech recognition module are connected, It is characterized in that the construction method includes the following steps:
    步骤S1,将所述麦克风的输出信号进行分析以得到多个信号参数;Step S1, analyzing the output signal of the microphone to obtain multiple signal parameters;
    步骤S2,根据所述信号参数判断所述输出信号是否为预设的激活语音。Step S2, judging whether the output signal is a preset activation voice according to the signal parameter.
  9. 如权利要求8所述的自学习语音识别系统的构建方法,其特征在于, 所述步骤S2中,提供一神经网络,通过所述神经网络判断所述输出信号是否为预设的激活语音。8. The method for constructing a self-learning speech recognition system according to claim 8, wherein in the step S2, a neural network is provided, and the neural network determines whether the output signal is a preset activation voice.
  10. 如权利要求9所述的自学习语音识别系统的构建方法,其特征在于,所述神经网路包括:9. The method for constructing a self-learning speech recognition system according to claim 9, wherein the neural network comprises:
    第一计算单元,用于根据波谷、波峰和间隔时间输出包络值;The first calculation unit is used to output the envelope value according to the trough, the crest and the interval time;
    第二计算单元,用于根据所述波谷和所述波峰输出相邻的所述波谷和所述波峰组成的波沿的数量;A second calculation unit, configured to output the number of wave edges formed by the adjacent wave trough and the wave crest according to the wave trough and the wave crest;
    第三计算单元,用于根据所述波谷输出相邻的两个所述波谷的差;A third calculation unit, configured to output the difference between two adjacent wave troughs according to the wave trough;
    第四计算单元,用于根据所述波峰输出相邻的两个所述波峰的差;A fourth calculation unit, configured to output the difference between two adjacent wave crests according to the wave crest;
    隐层,包括多个第一节点,每个所述第一节点均与所述第一计算单元、所述第二计算单元、所述第三计算单元和所述第四计算单元连接,每个所述第一节点设置一个所述激活语音的一个特征信息,所述第一节点接收并判断所述包络值、所述波沿的数量、相邻的两个所述波谷的差和相邻的两个所述波峰的差是否符合对应的所述特征信息,并将判断结果输出;The hidden layer includes a plurality of first nodes, and each of the first nodes is connected to the first calculation unit, the second calculation unit, the third calculation unit, and the fourth calculation unit. The first node sets a feature information of the activated voice, and the first node receives and judges the envelope value, the number of the wave edges, the difference between two adjacent wave troughs, and the adjacent Whether the difference between the two wave crests accords with the corresponding characteristic information, and output the judgment result;
    输出层,包括多个第二节点,每个所述第二节点与每个所述第一节点连接,每个所述第二节点设置对应的激活语音,根据所述判断结果判断所述输出信号是否符合所述激活语音;The output layer includes a plurality of second nodes, each of the second nodes is connected to each of the first nodes, and each of the second nodes is set with a corresponding activation voice, and the output signal is determined according to the determination result Whether it meets the activation voice;
    所述步骤S2包括以下步骤:The step S2 includes the following steps:
    步骤S21,通过所述波谷、所述波峰和所述间隔时间进行计算得到所述包络值;和Step S21, calculating the envelope value through the wave trough, the wave crest, and the interval time; and
    通过所述波谷和所述波峰进行计算得到相邻的所述波谷和所述波峰组成的所述波沿的数量;和Calculating the number of the wave edges formed by the adjacent wave troughs and wave crests by calculating the wave troughs and the wave crests; and
    通过所述波谷进行计算得到相邻的两个所述波谷的差;和Calculate the difference between two adjacent wave troughs through the wave trough; and
    通过所述波峰进行计算得到相邻的两个所述波峰的差;Calculate the difference between two adjacent wave crests through the wave crest;
    步骤S22,每个所述第一节点接收并将判断所述包络值、所述波沿的数量、相邻的两个所述波谷的差和相邻的两个所述波峰的差是否符合所述特征 信息,并将判断结果输出;Step S22, each of the first nodes receives and determines whether the envelope value, the number of the wave edges, the difference between the two adjacent wave troughs, and the difference between the two adjacent wave crests conform to The characteristic information and output the judgment result;
    步骤S23,每个所述第二节点根据所述判断结果判断所述输出信号是否符合所述激活语音,并输出判断结果。Step S23, each of the second nodes judges whether the output signal conforms to the activation voice according to the judgment result, and outputs the judgment result.
PCT/CN2020/109393 2019-09-05 2020-08-14 Construction apparatus and construction method for self-learning speech recognition system WO2021042969A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910838612.0 2019-09-05
CN201910838612.0A CN110610710B (en) 2019-09-05 2019-09-05 Construction device and construction method of self-learning voice recognition system

Publications (1)

Publication Number Publication Date
WO2021042969A1 true WO2021042969A1 (en) 2021-03-11

Family

ID=68892341

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/109393 WO2021042969A1 (en) 2019-09-05 2020-08-14 Construction apparatus and construction method for self-learning speech recognition system

Country Status (2)

Country Link
CN (1) CN110610710B (en)
WO (1) WO2021042969A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610710B (en) * 2019-09-05 2022-04-01 晶晨半导体(上海)股份有限公司 Construction device and construction method of self-learning voice recognition system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
US20190005953A1 (en) * 2017-06-29 2019-01-03 Amazon Technologies, Inc. Hands free always on near field wakeword solution
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
US20190214002A1 (en) * 2018-01-09 2019-07-11 Lg Electronics Inc. Electronic device and method of controlling the same
CN110610710A (en) * 2019-09-05 2019-12-24 晶晨半导体(上海)股份有限公司 Construction device and construction method of self-learning voice recognition system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540979B2 (en) * 2014-04-17 2020-01-21 Qualcomm Incorporated User interface for secure access to a device using speaker verification
CN107102713A (en) * 2016-02-19 2017-08-29 北京君正集成电路股份有限公司 It is a kind of to reduce the method and device of power consumption
US10515629B2 (en) * 2016-04-11 2019-12-24 Sonde Health, Inc. System and method for activation of voice interactive services based on user state
CN109166571B (en) * 2018-08-06 2020-11-24 广东美的厨房电器制造有限公司 Household appliance awakening word training method and device and household appliance
CN108922553B (en) * 2018-07-19 2020-10-09 苏州思必驰信息科技有限公司 Direction-of-arrival estimation method and system for sound box equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
US20190005953A1 (en) * 2017-06-29 2019-01-03 Amazon Technologies, Inc. Hands free always on near field wakeword solution
US20190214002A1 (en) * 2018-01-09 2019-07-11 Lg Electronics Inc. Electronic device and method of controlling the same
CN108538305A (en) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer readable storage medium
CN109360585A (en) * 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN110610710A (en) * 2019-09-05 2019-12-24 晶晨半导体(上海)股份有限公司 Construction device and construction method of self-learning voice recognition system

Also Published As

Publication number Publication date
CN110610710A (en) 2019-12-24
CN110610710B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
CN111223497B (en) Nearby wake-up method and device for terminal, computing equipment and storage medium
WO2018059405A1 (en) Voice control system, wakeup method and wakeup apparatus therefor, electrical appliance and co-processor
TWI683306B (en) Control method of multi voice assistant
US10157629B2 (en) Low power neuromorphic voice activation system and method
TWI474317B (en) Signal processing apparatus and signal processing method
US20190207777A1 (en) Voice command processing in low power devices
CN110473536B (en) Awakening method and device and intelligent device
CN106653021A (en) Voice wake-up control method and device and terminal
CN107403621A (en) Voice Rouser and method
EP3526789B1 (en) Voice capabilities for portable audio device
WO2021098153A1 (en) Method, system, and electronic apparatus for detecting change of target user, and storage medium
CN106161755A (en) A kind of key word voice wakes up system and awakening method and mobile terminal up
US11295761B2 (en) Method for constructing voice detection model and voice endpoint detection system
WO2021042969A1 (en) Construction apparatus and construction method for self-learning speech recognition system
CN111192590B (en) Voice wake-up method, device, equipment and storage medium
US20190302869A1 (en) Information processing method and electronic device
CN111429901A (en) IoT chip-oriented multi-stage voice intelligent awakening method and system
CN106612367A (en) Speech wake method based on microphone and mobile terminal
CN106155621A (en) The key word voice of recognizable sound source position wakes up system and method and mobile terminal up
CN111179944B (en) Voice awakening and age detection method and device and computer readable storage medium
WO2021262314A1 (en) Low power mode for speech capture devices
CN111223489B (en) Specific keyword identification method and system based on Attention mechanism
CN108899028A (en) Voice awakening method, searching method, device and terminal
WO2023010861A1 (en) Wake-up method, apparatus, device, and computer storage medium
CN108093350B (en) Microphone control method and microphone

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20861661

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20861661

Country of ref document: EP

Kind code of ref document: A1