CN113409788A

CN113409788A - Voice wake-up method, system, device and storage medium

Info

Publication number: CN113409788A
Application number: CN202110802658.4A
Authority: CN
Inventors: 姚林; 蒋黎明; 薛小刚; 杨雪荣
Original assignee: Shenzhen Tongxingzhe Technology Co ltd
Current assignee: Shenzhen Tongxingzhe Technology Co ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-09-17

Abstract

The invention relates to the field of voice awakening, and discloses a voice awakening method, a system, equipment and a storage medium. The voice wake-up method comprises the following steps: the intelligent wearable system acquires awakening voice data; preprocessing the awakening voice data to obtain processed voice data; judging whether the processed voice data is preset awakening voice data or not; if the voice is awakened, the processed voice data is sent to the intelligent control system; and the intelligent control system receives the processed voice data, analyzes and processes the processed voice data to obtain an operation instruction, and operates and processes the built-in software based on the operation instruction.

Description

Voice wake-up method, system, device and storage medium

Technical Field

The present invention relates to the field of voice wake-up, and in particular, to a voice wake-up method, system, device, and storage medium.

Background

The smart phone has wide application in life, and shopping, chatting, listening to songs, watching movies and television works, navigating, learning, reading and working are all completed on the smart phone, so that the function and the position of the smart phone in the life of people are extremely important. People need to concentrate attention when using the mobile phone, and intelligent processing is carried out on a double-hand or single-hand operation interface.

However, in some fixed scenarios, the smartphone has no way to operate, but there is a smartphone operating demand. For example, in the bicycle riding process, the bicycle is inevitably ridden when the two-wheeled vehicle is ridden, and the bicycle is in a dangerous condition when the bicycle is ridden when a user touches a screen to operate the mobile phone, or the user needs a mobile phone map to perform navigation guidance. Therefore, in some scenarios, the smart phone cannot be used, but needs to be operated, and the problem of inconvenience in use in such application scenarios needs to be solved.

Disclosure of Invention

The invention mainly aims to solve the technical problem that a smart phone cannot be used in some scenes, but needs to be operated, so that the smart phone is inconvenient to operate.

The first aspect of the present invention provides a voice wake-up method, which is applied to a voice wake-up system, where the voice wake-up system includes: the voice awakening method comprises the following steps:

the intelligent wearable system acquires awakening voice data;

preprocessing the awakening voice data to obtain processed voice data;

judging whether the processed voice data is preset awakening voice data or not;

if the voice is awakened, the processed voice data is sent to the intelligent control system;

and the intelligent control system receives the processed voice data, analyzes and processes the processed voice data to obtain an operation instruction, and operates and processes the built-in software based on the operation instruction.

Optionally, in a first implementation manner of the first aspect of the present invention, the acquiring wake-up voice data includes:

acquiring first external voice data based on software, and acquiring second external voice data based on hardware;

merging the first external voice data and the second external voice data to obtain echo voice data;

and carrying out echo duplication removal processing on the echo voice data to generate awakening voice data.

Optionally, in a second implementation manner of the first aspect of the present invention, the performing echo deduplication processing on the echo voice data to generate wakeup voice data includes:

carrying out duplication elimination processing on the reverberation voice data to obtain duplication elimination data;

and performing pcm coding processing on the duplication-removing data to obtain awakening voice data.

Optionally, in a third implementation manner of the first aspect of the present invention, the preprocessing the wake-up voice data to obtain processed voice data includes:

denoising the awakening voice to obtain transit voice data;

and performing gain processing on the converted voice data to obtain processed voice data.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the determining whether the processed voice data is preset wake-up voice data includes:

performing framing processing on the processed voice data to obtain first analytic voice data;

windowing the first analytic voice data to obtain second analytic voice data;

carrying out fast Fourier transform processing on the second analytic voice data to obtain third analytic voice data;

performing feature extraction processing on the third analytic voice data to obtain a feature value;

and judging whether the characteristic value is a preset awakening value or not according to a preset hidden Markov model.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the analyzing the processed voice data to obtain the operation instruction includes:

according to a preset voice recognition neural network, carrying out recognition processing on the processed voice data to obtain a recognition value;

and activating the identification value according to a preset activation function to obtain an operation instruction corresponding to the identification value.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the receiving, by the intelligent control system, the processed voice data includes:

the intelligent wearing system wakes up the Bluetooth receiving function of the intelligent control system and transmits the processed voice data to the intelligent control system based on the Bluetooth receiving function.

A second aspect of the present invention provides a voice wake-up system, including:

the intelligent wearable system and the intelligent control system;

the intelligent wearable system is used for acquiring awakening voice data; preprocessing the awakening voice data to obtain processed voice data; judging whether the processed voice data is preset awakening voice data or not; if the voice is awakened, the processed voice data is sent to the intelligent control system;

the intelligent control system is used for receiving the processed voice data, analyzing the processed voice data to obtain an operation instruction, and performing operation processing on the built-in software based on the operation instruction.

A third aspect of the present invention provides a voice wake-up apparatus, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the voice wakeup device to perform the voice wakeup method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the above-mentioned voice wake-up method.

In the embodiment of the invention, the influence of environmental noise in the voice transmission process is eliminated through information transmission between the wearing system and the intelligent operating system, voice is transmitted by using Bluetooth, voice data is executed in the intelligent operating system, and the operation of the smart phone in an application scene difficult to operate is realized.

Drawings

FIG. 1 is a diagram of a voice wake-up method according to an embodiment of the present invention;

FIG. 2 is a diagram of a voice wake-up system according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an embodiment of a voice wake-up apparatus according to the embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a voice awakening method, a system, equipment and a storage medium.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, a detailed flow of the embodiment of the present invention is described below, and referring to fig. 1, a voice wake-up method according to an embodiment of a voice wake-up method in the embodiment of the present invention is applied to a voice wake-up system, where the voice wake-up system includes: the voice awakening method comprises the following steps:

101. the intelligent wearable system acquires awakening voice data;

in this embodiment, intelligence wearing system can intelligent helmet, intelligent bracelet, intelligent wrist-watch, when receiving the collection, is responsible for the collection of sound signal, including microphone signal, reference signal's collection, the colleague still exists the resampling collection to sound signal, and final coding is pcm data.

Further, step 101 may take the following implementation:

1011. acquiring first external voice data based on software, and acquiring second external voice data based on hardware;

1012. merging the first external voice data and the second external voice data to obtain echo voice data;

1013. and carrying out echo duplication removal processing on the echo voice data to generate awakening voice data.

In 1011-1013 step, a microphone signal and a reference signal are obtained simultaneously, a sampling value of bluetooth music playing is generally 44.1k, but an audio sampling value of speech recognition is generally 16k and needs to be resampled to 16k, and for the conventional resampling needs to generally adopt software resampling, the scheme uses hardware resampling to perform digital-to-analog conversion on a data signal of 44.1k to an analog signal, and then performs digital-to-analog conversion again to a digital signal of 16 k.

Further, the following operations may also be performed in 1013 step:

10131. carrying out duplication elimination processing on the reverberation voice data to obtain duplication elimination data;

10132. and performing pcm coding processing on the duplication-removing data to obtain awakening voice data.

In the 10131-10132 step, after the microphone signal and the resampled reference signal are collected, the microphone signal and the reference signal are alternately arranged and combined according to one data block every 160ms, and compared with a traditional stereo data structure (left channel data + right channel data), the combination method can reduce the splitting times during echo cancellation, and avoid data re-splitting and data processing efficiency. Alternatively, the echo cancellation technique uses an echo cancellation method, that is, the magnitude of the echo signal is estimated by an adaptive method, and then the estimated value is subtracted from the received signal to cancel the echo.

102. Preprocessing the awakening voice data to obtain processed voice data;

in the embodiment, the sound preprocessing is noise reduction and automatic gain, the traditional preprocessing generally only has echo elimination, but not noise reduction and automatic gain.

Preferably, the following steps may be employed at step 102:

1021. denoising the awakening voice to obtain transit voice data;

1022. and performing gain processing on the converted voice data to obtain processed voice data.

In step 1021-. Automatic gain control deals with volume changes in the face of recording due to a large number of different settings. AGC provides a way to adjust the reference volume. This is useful in VOIP because the gain of the microphone does not need to be manually adjusted. Yet another advantage is that the microphone gain is at a more conservative level, which makes it easier to avoid clipping, distortion. When a microphone array or multi-path sampling is performed, delay jitter may occur, such as echo cancellation for sound data under different delay conditions in the AEC technique.

103. Judging whether the processed voice data is preset awakening voice data or not;

in the embodiment, the scheme of distribution on demand is creatively used for distributing the recording, and whether sound is required to be distributed to the Bluetooth transmission module to be transmitted to the mobile phone is determined by the control signal of the wake-up engine, so that the recording is not transmitted when the recording is not required in a non-recognition state, the data volume of Bluetooth transmission data is reduced, and the power consumption is reduced.

Preferably, the following steps can be adopted in step 103:

1031. performing framing processing on the processed voice data to obtain first analytic voice data;

1032. windowing the first analytic voice data to obtain second analytic voice data;

1033. carrying out fast Fourier transform processing on the second analytic voice data to obtain third analytic voice data;

1034. performing feature extraction processing on the third analytic voice data to obtain a feature value;

1035. and judging whether the characteristic value is a preset awakening value or not according to a preset hidden Markov model.

In the steps 1031-. If the hit awakening word is a quick control instruction, the control instruction is only required to be sent to the mobile phone for mobile phone control without transmitting the recording.

104. If the voice is awakened, the processed voice data is sent to the intelligent control system;

in the embodiment, the scheme of distribution on demand is creatively used for distributing the recording, and whether sound is required to be distributed to the Bluetooth transmission module to be transmitted to the mobile phone is determined by the control signal of the wake-up engine, so that the recording is not transmitted when the recording is not required in a non-recognition state, the data volume of Bluetooth transmission data is reduced, and the power consumption is reduced. Compared with the traditional awakening module, the scheme can realize remote awakening by awakening and identifying separation, and the awakening does not occupy an additional CPU.

105. And the intelligent control system receives the processed voice data, analyzes and processes the processed voice data to obtain an operation instruction, and operates and processes the built-in software based on the operation instruction.

In this embodiment, the smart operating system may be a mobile phone, the mobile phone and the smart wearable device transmit voice data to the mobile phone through bluetooth transmission, when processing voice data, a traditional voice analysis means may be adopted, or a smart recognition algorithm may be adopted to perform voice recognition analysis, and an operation instruction corresponding to the voice data obtained through analysis may be an operation instruction for turning on a camera, turning on a recording function, turning on a navigation app, setting a navigation route, performing a voice navigation function, replying social software information, and the like.

Preferably, the "intelligent control system receives the processed voice data" may perform the following operations:

1051. the intelligent wearing system wakes up the Bluetooth receiving function of the intelligent control system and transmits the processed voice data to the intelligent control system based on the Bluetooth receiving function.

In step 1051, the native voice assistant of the mobile phone is started through the bluetooth communication protocol after the voice wake-up command is hit. The recording is sent to the mobile phone for voice recognition through Bluetooth after being preprocessed by the front section. When the awakening words (answering, hanging up, playing, pausing and the like) are hit, corresponding control instructions are directly sent to the mobile phone through the Bluetooth module. The intelligent helmet controls the navigation of the mobile phone, answering and dialing a call, playing music, sending WeChat and the like through voice.

Preferably, "analyzing the processed voice data to obtain an operation instruction" may perform the following operations:

1052. according to a preset voice recognition neural network, carrying out recognition processing on the processed voice data to obtain a recognition value;

1053. and activating the identification value according to a preset activation function to obtain an operation instruction corresponding to the identification value.

In step 1052-1053, the speech recognition neural network may be Text-CNN, RNN, mobileNetv3, etc. which can recognize speech data, and after convolution, pooling and clipping processes, a result matrix is obtained.

It should be noted that the output content of the result matrix is an operation instruction, and it is not necessary to convert the voice data into the text data, analyze the text data, and generate the voice operation system corresponding to the text data. It should be further explained that, when processing the result matrix, the activation function can be trained and adjusted to be a neural network, that is, the GAN confrontation training model trains the classifier, so that the analysis result of the activation function is more accurate.

The above describes the voice wake-up method in the embodiment of the present invention, and the following describes the voice wake-up system in the embodiment of the present invention with reference to fig. 2, where the voice wake-up system in the embodiment of the present invention includes:

an intelligent wearing system 201 and an intelligent control system 202;

the intelligent wearable system 201 is used for acquiring awakening voice data; preprocessing the awakening voice data to obtain processed voice data; judging whether the processed voice data is preset awakening voice data or not; if the voice is awakened, the processed voice data is sent to the intelligent control system;

the intelligent control system 202 is configured to receive the processed voice data, analyze the processed voice data to obtain an operation instruction, and perform operation processing on the built-in software based on the operation instruction.

Wherein, the intelligent wearable system 201 can be further specifically configured to:

denoising the awakening voice to obtain transit voice data;

windowing the first analytic voice data to obtain second analytic voice data;

Wherein, the intelligent control system 202 may be further specifically configured to:

Fig. 2 describes the voice wakeup system in detail from the perspective of the modular functional entity in the embodiment of the present invention, and the voice wakeup device in detail from the perspective of hardware processing in the embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a voice wake-up apparatus 300 according to an embodiment of the present invention, where the voice wake-up apparatus 300 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 310 (e.g., one or more processors) and a memory 320, and one or more storage media 330 (e.g., one or more mass storage devices) storing applications 333 or data 332. Memory 320 and storage media 330 may be, among other things, transient or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instructions operating on the voice wake-up device 300. Still further, the processor 310 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the voice wake-up device 300.

The voice-based wake-up apparatus 300 may also include one or more power supplies 340, one or more wired or wireless network interfaces 350, one or more input-output interfaces 360, and/or one or more operating systems 331, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the voice wake-up device shown in fig. 3 does not constitute a limitation of voice wake-up based devices and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the voice wake-up method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described system or system and unit may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A voice wake-up method is applied to a voice wake-up system, and the voice wake-up system comprises: the voice awakening method comprises the following steps:

the intelligent wearable system acquires awakening voice data;

preprocessing the awakening voice data to obtain processed voice data;

judging whether the processed voice data is preset awakening voice data or not;

2. The voice wake-up method of claim 1, wherein the obtaining wake-up voice data comprises:

3. The voice wake-up method of claim 2, wherein the echo de-duplication processing the reverberant voice data to generate wake-up voice data comprises:

4. A voice wake-up method according to any of claims 2-3, wherein the pre-processing the wake-up voice data to obtain processed voice data comprises:

denoising the awakening voice to obtain transit voice data;

5. The voice wake-up method according to claim 1, wherein the determining whether the processed voice data is preset wake-up voice data comprises:

windowing the first analytic voice data to obtain second analytic voice data;

6. The voice wake-up method according to claim 1, wherein the parsing the processed voice data to obtain the operation command comprises:

7. The voice wake-up method according to claim 1, wherein the receiving of the processed voice data by the smart steering system comprises:

8. A voice wake-up system, the voice wake-up system comprising:

the intelligent wearable system and the intelligent control system;

9. A voice wake-up device, characterized in that the voice wake-up device comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the voice wake-up device to perform the voice wake-up method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the voice wake-up method according to any one of claims 1-7.